Announcement

Collapse
No announcement yet.

ByteDance Working To Make It Faster Kexec Booting The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • sinepgib
    replied
    Originally posted by coder View Post
    I guess you're disagreeing by way of agreeing with an obviously false analogy? Cute, but confusing.
    Yes, but I think context should suffice to show the sarcasm here. I promise I'll use /s next time.

    Leave a comment:


  • coder
    replied
    Originally posted by sinepgib View Post
    Efficiency is so old fashioned in the age of multicores.
    I guess you're disagreeing by way of agreeing with an obviously false analogy? Cute, but confusing.

    Leave a comment:


  • coder
    replied
    Originally posted by cj.wijtmans View Post
    Uptime is so old fashioned in the age of redundancy.
    The only time it truly doesn't matter is when you have hot-failover. Otherwise, you're having to restore from a snapshot and that can get costly.

    It's not directly comparable, but still instructive to look at what happened to RAID. Basically, rebuild times have gotten so bad that it's basically at a dead end.

    Leave a comment:


  • sinepgib
    replied
    Originally posted by cj.wijtmans View Post
    Uptime is so old fashioned in the age of redundancy.
    Efficiency is so old fashioned in the age of multicores.

    Leave a comment:


  • sinepgib
    replied
    Originally posted by lowflyer View Post
    The link to the the kernel list was there in the original phoronix article.
    Fair, but was there discussion about the need at the point we started arguing here?

    Originally posted by lowflyer View Post
    I work with the same principle at my job. But "99.9999% to 99.99999%" is not the same thing as "doubled the throughput"
    Well, I don't work at ensuring availability, I can't give you real life examples of that

    Originally posted by lowflyer View Post
    I'm not concerned for "Cookie Clicker" and similar mouse-key abusing games. My concerns are on the like- and vote-driven social media. If you have a plan to "shape public opinion", a click- or bot-farm that can "reboot quickly into new identities" can come in "very handy". That's where the nasty stuff begins.
    I see. That's a very good point.

    Originally posted by lowflyer View Post
    I certainly encourage you to have a look into the kernel mailing list from time to time. I'm not a regular reader of the list. I usually only read some passages when I get a hint from phoronix. It's not necessary to be an expert. But reading "directly from the source" is always enlightening in many ways. It's not the technicalities, it's reading the original voice and tone and the intentions of the authors of the kernel. In this case here it's interesting to see the pushback that Albert receives. At the time of my reading, there was no hint of a "build option" - yet. I think that would be the correct way to go with this pull request.
    Most often I only go in for stuff in the weekly quotes from LWN or to look at an interesting patch. But at a distance this simply looked like a "cool but not that interesting" patch to me, and those threads can get long and eat lots of time.

    Leave a comment:


  • cj.wijtmans
    replied
    Uptime is so old fashioned in the age of redundancy.

    Leave a comment:


  • lowflyer
    replied
    Originally posted by sinepgib View Post
    ... I don't agree there was evidence when this long thread started.
    The link to the the kernel list was there in the original phoronix article.

    Originally posted by sinepgib View Post
    You save 200-500ms here and 200-500ms there and that compounds so you get from 99.9999% to 99.99999% at some point. Most optimizations are like that, few percentage here, few percentage there. Last few weeks I doubled the throughput of the project I'm working on by just doing exactly that.
    I work with the same principle at my job. But "99.9999% to 99.99999%" is not the same thing as "doubled the throughput". But let's look at realistic numbers, assuming a server farm where "uptime" is required to be above 99.999% (pretty much standard offer. e.g. IBM, Stratus)
    • 99.999% uptime is 0.001% downtime. This corresponds to about 5.25 minutes downtime per year
    • If a reboot takes one minute, this allows five reboots per year
    • The 500 ms that you could shave off over five reboots add up to a gain of 2.5 seconds uptime
    • Let's assume a reboot takes only 10 seconds, then we could do 31 reboots per year
    • If you improve the boot time in this case by 500 ms, we are still only gain 16 seconds additional uptime
    If we want to achieve 99.9999% uptime the whole thing looks even worse. (I have heard about a Stratus offer, but I was not able to find it myself)

    The whole argument of "this patch is important for server farms" is blown out the window with these figures. Not yet considering the security impact it may have. So, what are the remaining use cases for rebooting quicker? IMHO this only pays of when you have to reboot *a lot* in rapid succession. This is where my fantasy starts roaming ...

    Originally posted by sinepgib View Post
    Of course it is, but click fraud is not something nasty towards the general population or safety or anything I deemed worth worrying that much. It may actually be beneficial if it gets to confuse the marketing machine.
    I'm not concerned for "Cookie Clicker" and similar mouse-key abusing games. My concerns are on the like- and vote-driven social media. If you have a plan to "shape public opinion", a click- or bot-farm that can "reboot quickly into new identities" can come in "very handy". That's where the nasty stuff begins.

    Originally posted by sinepgib View Post
    I should definitely go on and read the thread soon. That does sound terrible. It won't probably affect you and me, but kexec users would be at big risk. Is that a build time option at least?
    I certainly encourage you to have a look into the kernel mailing list from time to time. I'm not a regular reader of the list. I usually only read some passages when I get a hint from phoronix. It's not necessary to be an expert. But reading "directly from the source" is always enlightening in many ways. It's not the technicalities, it's reading the original voice and tone and the intentions of the authors of the kernel. In this case here it's interesting to see the pushback that Albert receives. At the time of my reading, there was no hint of a "build option" - yet. I think that would be the correct way to go with this pull request.

    Leave a comment:


  • sinepgib
    replied
    Originally posted by lowflyer View Post
    Perhaps I was misunderstood. Perhaps I did not express myself clearly. (English is not my mother tongue)
    And neither is mine, so things are twice as likely to get lost in translation

    Originally posted by lowflyer View Post
    I never said "OpenSSL is the same as Linux". I said "similar things that happened to OpenSSL also happened to Linux". And it did.
    We both agree. And I just answered that this same thing happened in significantly different circumstances.

    Originally posted by lowflyer View Post
    You mentioned a patch of yours that *also* got only two reviews.
    I did, because OP was also painting an incomplete picture: the kernel does get much more review... In key areas. MTD for simple cases where you already decided to get a big security hole by enabling debugfs (my patch was solely about migrating from some proprietary damaged sector marking in a NAND flash to mainline, it added a knob to debugfs to force a clean slate, risky and meant only for initial bringup of hardware) is not considered that much of a key area. The memory management subsystem definitely gets a lot more care. I don't know which subsystem takes care of kexec.
    But all in all, I try to keep conversation honest, so I won't pretend the kernel landscape is ideal just to try to win an argument.

    Originally posted by lowflyer View Post
    I mentioned the UMN case.
    Yes, and I responded that neither did the patches get in nor are they necessarily representative of how all of a country operates. I also agreed that patches from UMN should at the very least be reviewed with full on paranoia in mind since then, because now there's evidence of malice.

    Originally posted by lowflyer View Post
    And there are other cases too, just dig deep enough into the kernel mailing list archive. Linux was just "lucky" compared to OpenSSL.
    I'll take your word with that, I don't sense an intention to bullshit from you and I don't have the time to double check. But I'm not surprised that this is the case.

    Originally posted by lowflyer View Post
    What you say about OpenSSL is common knowledge now - but it is knowledge after the fact. Nobody would have questioned how they conduct their code reviews before Heartbleed. The public impression at that time was: "heck, they are even certified". A false impression in hindsight. IIRC, the questionable commit was in their repository for almost two years. I certainly hope that we're doing better on Linux by now. - The mailing list discussion gives me reason to hope.
    I'm certainly aware of that. But at the very least it's different in the sheer amount of contributors, many with conflicting interests, which in some way helps regulate what gets in, as everyone tries to protect themselves from the other parts.

    Originally posted by lowflyer View Post
    (Please note that I'm *not* saying that their certification was worth something and I'm also not saying anything about the quality of their code reviews)
    Yeah, certifications, be them for projects, companies or people, tends to be just theater IMO.

    Originally posted by lowflyer View Post
    On the contrary, I think we mostly agree. We even agree that China is not to be trusted blindly.
    Certainly we do. I just make it extensive, even while I trust China (for its authoritarian government mixed with involvement in private companies) even less than say a western company, I trust neither blindly.

    Originally posted by lowflyer View Post
    Where we disagree is "assuming malice" or "ill will". Perhaps I'm misrepresenting you, (correct me If I'm wrong) but I get the impression you "always assume best interests, at least for new actors - regardless of any question marks (or until we have evidence)".
    You paint me pretty much right, except for the "regardless of any question marks". The "until we have evidence" is more like it. "Best interests" sounds rather strong an assumption tho, I don't assume altruism, which is what that sounds like. I assume selfish motivations, but those don't necessarily mean hostile ones.

    Originally posted by lowflyer View Post
    I, on the other hand, am not willing to ignore question marks. I don't call that "assuming malice". To me, this is just "normal prudence".
    Ignoring question marks is of course reckless, as you pointed out. I just don't make them anything special but something to scrutinize. If we don't get satisfactory answers after queried, then we might suspect malice. In my book "normal prudence" really applies to anyone. "Special prudence", however, applies when you have evidence of malice (or incompetence as well, but again, incompetent people can be helped become competent and a few rounds of sending them to fix their patches should make it OK). So, say Facebook merits "normal prudence", UMN merits "special prudence", if a chance is given at all after their misbehavior.

    Originally posted by lowflyer View Post
    I think there is evidence enough:
    • the usefulness of the pull request is very questionable. It raised more than a few eyebrows on the mailing list
    • the pull request may have the potential to eventually open new attack vectors
    • Albert (the original author of the pull request) did not answer a direct question about the use on specific hardware and was quite vague about the use case
    • It is (quite) common knowledge that there is a dark side to "Chinese involvement" that may backfire
    We do agree there's evidence now, as I mentioned a few posts back. I don't agree there was evidence when this long thread started.

    Originally posted by lowflyer View Post
    Each of these questions in isolation are not alarming. But the combination of all should at least raise attention levels.
    Indeed.

    Originally posted by lowflyer View Post
    So you save 200 - 500ms on reboot once per month/week? That'll improve your uptime from 99.99990001% to 99.99990002%
    You save 200-500ms here and 200-500ms there and that compounds so you get from 99.9999% to 99.99999% at some point. Most optimizations are like that, few percentage here, few percentage there. Last few weeks I doubled the throughput of the project I'm working on by just doing exactly that.

    Originally posted by lowflyer View Post
    Ad fraud is a fraud nevertheless. But let your phantasy roam...
    Of course it is, but click fraud is not something nasty towards the general population or safety or anything I deemed worth worrying that much. It may actually be beneficial if it gets to confuse the marketing machine.

    Originally posted by lowflyer View Post
    Yes, perhaps. But wasn't that the same attitude that brought us Heartbleed? Let's not put the cart before the horse: "Not applying that patch is not a measure to avoid shenanigans". My real concern is about the negation of this sentence: "Applying that patch can be used to avoid detection of shenanigans". The pull request and its comments say this is *exactly* what this pull request intends to do (at least in my limited understanding):
    • removing a sha256 check
    • using a possibly tainted kernel image to boot from
    Again, each single point on its own is not so concerning. But both of them in combination, permanently, perhaps in all future linux kernels ?
    I should definitely go on and read the thread soon. That does sound terrible. It won't probably affect you and me, but kexec users would be at big risk. Is that a build time option at least?

    Leave a comment:


  • lowflyer
    replied
    Originally posted by sinepgib View Post
    ... Someone explicitly said the kernel works one way, you gave an example from a different project. ...
    Originally posted by sinepgib View Post
    Of course, as you said, oversights can and do happen tho, they're just less likely than they were for OpenSSL, ...
    Perhaps I was misunderstood. Perhaps I did not express myself clearly. (English is not my mother tongue) I never said "OpenSSL is the same as Linux". I said "similar things that happened to OpenSSL also happened to Linux". And it did. You mentioned a patch of yours that *also* got only two reviews. I mentioned the UMN case. And there are other cases too, just dig deep enough into the kernel mailing list archive. Linux was just "lucky" compared to OpenSSL.

    What you say about OpenSSL is common knowledge now - but it is knowledge after the fact. Nobody would have questioned how they conduct their code reviews before Heartbleed. The public impression at that time was: "heck, they are even certified". A false impression in hindsight. IIRC, the questionable commit was in their repository for almost two years. I certainly hope that we're doing better on Linux by now. - The mailing list discussion gives me reason to hope.

    (Please note that I'm *not* saying that their certification was worth something and I'm also not saying anything about the quality of their code reviews)

    Originally posted by sinepgib View Post
    Oh, but that's where we disagree. ...
    On the contrary, I think we mostly agree. We even agree that China is not to be trusted blindly.

    Where we disagree is "assuming malice" or "ill will". Perhaps I'm misrepresenting you, (correct me If I'm wrong) but I get the impression you "always assume best interests, at least for new actors - regardless of any question marks (or until we have evidence)".

    I, on the other hand, am not willing to ignore question marks. I don't call that "assuming malice". To me, this is just "normal prudence".

    I think there is evidence enough:
    • the usefulness of the pull request is very questionable. It raised more than a few eyebrows on the mailing list
    • the pull request may have the potential to eventually open new attack vectors
    • Albert (the original author of the pull request) did not answer a direct question about the use on specific hardware and was quite vague about the use case
    • It is (quite) common knowledge that there is a dark side to "Chinese involvement" that may backfire
    Each of these questions in isolation are not alarming. But the combination of all should at least raise attention levels.

    Originally posted by sinepgib View Post
    That's fair.
    The rapid succession part is about having many servers apply the patch, of course a single one will reboot exactly once after applying the patch.
    So you save 200 - 500ms on reboot once per month/week? That'll improve your uptime from 99.99990001% to 99.99990002%

    Originally posted by sinepgib View Post
    As in ad fraud or something else?
    Ad fraud is a fraud nevertheless. But let your phantasy roam...

    Originally posted by sinepgib View Post
    But wouldn't they need to be root anyway for that to be possible? The machine is already owned at that point. Or you mean as an extra measure to avoid being blocked/detected?
    Yes, perhaps. But wasn't that the same attitude that brought us Heartbleed? Let's not put the cart before the horse: "Not applying that patch is not a measure to avoid shenanigans". My real concern is about the negation of this sentence: "Applying that patch can be used to avoid detection of shenanigans". The pull request and its comments say this is *exactly* what this pull request intends to do (at least in my limited understanding):
    • removing a sha256 check
    • using a possibly tainted kernel image to boot from
    Again, each single point on its own is not so concerning. But both of them in combination, permanently, perhaps in all future linux kernels ?

    Leave a comment:


  • sinepgib
    replied
    Originally posted by lowflyer View Post
    ... and the other comments

    Why is it these days not enough to say what you mean? Why is it necessary to *always* emphasize what you *did not mean*? (are you assuming malice?)
    Again, no need to assume malice, but the post was inaccurate. Someone explicitly said the kernel works one way, you gave an example from a different project. You might have missed it, I miss a lot of things and get corrected, shit happens. TBF the main problem in the kernel is not as much reviewers but a lack of proper testing infra. There's only so much review can catch, and only so many tests a devs computer can run for each patch.

    Originally posted by lowflyer View Post
    Ignoring the origin is recklessness. It's only prudent to look a little bit closer, given the track record of China. (I'm getting criticized for assuming malice)
    Oh, but that's where we disagree. It's always the prudent think to look closer. Absolutely always. One should not assume ill will, but that doesn't mean blindly trust any patch that gets dropped in the mailing list. Even without malice people screw up and introduce vulnerabilities. My only point is that that can also happen by accident, we don't need to assume just because it comes from a country whose management we don't trust (and we have many reasons to not trust their government, don't get me wrong, and I'm aware of their participation in companies) everyone there is some evil actor.

    Originally posted by lowflyer View Post
    I don't buy the stance that "oh - on linux *everything* is different and better". Linux has its own share of very similar issues.
    It does and I mentioned one. But the claim is still true, for the most important parts there's a lot of review. Of course, as you said, oversights can and do happen tho, they're just less likely than they were for OpenSSL, there are a lot of full time contributors paid with interests conflicting with other people inserting backdoor (even if there's always incentive to insert their own).

    Originally posted by lowflyer View Post
    coder mentioned the UMN students.
    Indeed he did, and I share his opinion: those UMN students (and probably anyone from that University) should be specifically merits special suspicions, due to their track record. But that doesn't extrapolate to the whole country.[*]This *does not* mean that all other reasons are not worth looking at. The reasons you mention are spot on!
    Originally posted by lowflyer View Post
    Albert did already not answer one explicit question. Could be an oversight. Could be a language issue. But could also be a ...
    And as I said, after that suspicion is perfectly valid. An assumption is something that comes before evidence. That's what I'm criticizing. Until I have evidence, I'll assume good will. UMN? We have evidence, a prior track record by them specifically. This guy, yes, now we have actual reasons to be suspicious, because after being questioned directly he refuses to answer what the use case is. But sometimes a pipe is just a pipe, even when it comes from someone dubious.

    Originally posted by lowflyer View Post
    Shaving off less than 500ms is not what the big companies are looking for. Live patching is a technique that *avoids* going through a reboot. And which security patch needs a reboot in rapid succession? This does not negate that shorter boot times are a good thing! I never said something like that.
    That's fair.
    The rapid succession part is about having many servers apply the patch, of course a single one will reboot exactly once after applying the patch.

    Originally posted by lowflyer View Post
    A hint is the title of the image: "Click Farm".
    As in ad fraud or something else?

    Originally posted by lowflyer View Post
    Another hint is in the last line of my previous post:
    But wouldn't they need to be root anyway for that to be possible? The machine is already owned at that point. Or you mean as an extra measure to avoid being blocked/detected?

    Leave a comment:

Working...
X