Announcement

Collapse
No announcement yet.

ByteDance Working To Make It Faster Kexec Booting The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • ByteDance Working To Make It Faster Kexec Booting The Linux Kernel

    Phoronix: ByteDance Working To Make It Faster Kexec Booting The Linux Kernel

    ByteDance as the Chinese company behind TikTok has been working on a number of Linux kernel optimizations the past few years and their most recent work is for faster Kexec rebooting of the kernel. With their massive fleet of servers powering TikTok and other apps, they will do whatever they can to shave milliseconds off the boot/reboot time of their servers and that is what most of their Linux optimizations have been about -- including this newest patch series for faster Kexec reboots...

    https://www.phoronix.com/news/Byteda...r-Kexec-Reboot

  • #2
    By using Kexec they avoid the more significant downtime of their servers POST'ing and other tasks.
    Wait, reboot also requires POST? Is there no way to skip it? I get it for cold boots, but I find it strange for reboots.

    Comment


    • #3
      Originally posted by sinepgib View Post

      Wait, reboot also requires POST? Is there no way to skip it? I get it for cold boots, but I find it strange for reboots.
      No, kexec is a reboot without BIOS POST/reboot. It simply loads a new Linux kernel and kills the old one.

      On servers, a full reboot cab be several minutes. I should really use kexec more myself

      Comment


      • #4
        Originally posted by S.Pam View Post
        No, kexec is a reboot without BIOS POST/reboot. It simply loads a new Linux kernel and kills the old one.
        A down side would seem to be that hardware can be in an undefined state, making it more likely to hit driver bugs?

        Comment


        • #5
          This strikes me as a slightly weird thing for a cloud service to prioritize. If you go a day between reboots then the savings is only 0.00058% more runtime per server. So, that suggests they're rebooting even more frequently - like a couple times/hour. Any idea why?

          For others, it makes more sense. Like if you're selling an appliance that's supposed to have 99.999% uptime, then shaving time of reboots makes sense, as that allows only about 5.5 minutes of downtime per year. A handful of reboots could quickly gobble that up.

          Comment


          • #6
            Originally posted by S.Pam View Post
            No, kexec is a reboot without BIOS POST/reboot. It simply loads a new Linux kernel and kills the old one.

            On servers, a full reboot cab be several minutes. I should really use kexec more myself
            Sorry if I wasn't clear, I meant a computer reboot, not kexec. I know what kexec does, but as the name implies it's more similar to calling exec than it is to rebooting the computer. I would have expected the latter to skip POST, just that (and since I don't reboot often but turn the computer off when I don't use instead, I haven't really paid attention to how the reboot itself works).

            Comment


            • #7
              Originally posted by coder View Post
              A down side would seem to be that hardware can be in an undefined state, making it more likely to hit driver bugs?
              It shouldn't cause that AFAICT, but it certainly doesn't help fix it if it happens for other reasons, right?

              Comment


              • #8
                Originally posted by coder View Post
                This strikes me as a slightly weird thing for a cloud service to prioritize. If you go a day between reboots then the savings is only 0.00058% more runtime per server. So, that suggests they're rebooting even more frequently - like a couple times/hour. Any idea why?

                For others, it makes more sense. Like if you're selling an appliance that's supposed to have 99.999% uptime, then shaving time of reboots makes sense, as that allows only about 5.5 minutes of downtime per year. A handful of reboots could quickly gobble that up.
                Frequent reboots may make sense in a few scenarios:
                - Batch processes, tho the cold boot time should be negligible compared to the work itself;
                - Serverless, but then they probably wouldn't be handling the boot themselves;
                - Spot instances for usage spikes (this is probably the most interesting one for optimizing boot speed);
                - VM migration between instances.

                Comment


                • #9
                  To me, this doesn't add up.

                  Not that I would trust the intentions of any Chinese company. But optimizing the boot time of a server is wasted effort (I have been told so by many "experts" even here in this forum). coder points out the same thing in post number 5. They must have a use case where they want to reboot many times per hour. Kernel upgrade does not fit that. This is hardly a server environment.

                  Unless you operate something along the lines of this:
                  image.png
                  Did I already say that I don't trust the intentions of any Chinese company?
                  Last edited by lowflyer; 27 July 2022, 05:40 AM. Reason: post number was expanded to something meaninglesss. fixed.

                  Comment


                  • #10
                    Originally posted by lowflyer View Post
                    Did I already say that I don't trust the intentions of any Chinese company?
                    Autism. All kernel patches get checked 10 times before merging. What should you care about is hardware Israeli-American backdoors in your CPU.
                    Last edited by RejectModernity; 25 July 2022, 12:20 PM.

                    Comment

                    Working...
                    X