Announcement

Collapse
No announcement yet.

ByteDance Working To Make It Faster Kexec Booting The Linux Kernel

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Ermine View Post
    No, ByteDance, you won't earn redemption this way.
    I think that's not the point. They're presumably doing this to suit needs they have, and upstreaming the patches so they don't have the burden of maintaining them out-of-tree.

    Comment


    • #22
      Originally posted by coder View Post
      My assumption is that BIOS/UEFI triggers devices to reset themselves in some fashion. Then, when the kernel's device driver starts interacting with the device, it's essentially starting from a blank slate. I don't know if this is true, however.

      In contrast, if you merely reset the kernel, then the various devices in the system could be left essentially in their previous state. This could tickle bugs in device drivers that you wouldn't hit in a full reboot.

      Again, all of this is conjecture. If anyone has actual knowledge to share on the subject, please do.
      Yes thats what the UEFI/BIOS does it´s goes trough the hole Post sequence and set´s cpu/ram clocks timeings etc. then it goes trough the PCI code and set´s the registers there, some pci registers are read only others are read/write, depending on the hardware you can set busmaster/cache/cachline size etc. those are the 1st registers in the 1st line then there is the rest that is device dependend.

      Utilities for DOS and Windows (uwe-sieber.de) Realy oldschool shit, back when you had to do device setup manually in some cases and int13 after int13 is the softreset, you would reload the bootsector without posting your system and reinitializing your pci hardware it was usefull in some wierd bug cases (via chipsets cough cough).

      So kexec is the linux version of the old dos int13 you kill the os without reboot and reload the new kernel image.

      So what happens now, all your PCI/PCIE/AGP devices keep their settings CPU Microcode stays the same aswell i guess, usually the kernel boot will reinitialize all devices given the driver specs, if it does not that could be considered a bug, the kernel has tons of PCI quirks etc. that get checked on boot so normally the pci registers get rewritten anyway and if not they run with the prior settings, what kind of wierd bugs can spring forth from that i can only guess.

      The most difficult part will be probably acpi i can bet that it will bug out, it does nowdays all the time when waking up from deep sleep and gets fixed all the time, someone shuold test kexec on laptops reload new kernel, close it and let it there for 12 hours if it´s nice deepfried you know it didn´t work

      Comment


      • #23
        Originally posted by S.Pam View Post

        No, kexec is a reboot without BIOS POST/reboot. It simply loads a new Linux kernel and kills the old one.

        On servers, a full reboot cab be several minutes. I should really use kexec more myself
        Maybe on servers, but many desktop systems may hang when you perform kexec multiple times in a row (might also be distro related). Even a single kexec might crash the system. I've tried it few time, but realized how unstable it was.

        Comment


        • #24
          I would assume desktop hardware is more unstable for kexec because its not really a use case they consider for desktop hardware. Also the kernel does not neccesarily deal with pci quirks you can turn this option off and idk how pci quirks deals with virtual hardware or if its even applicable on a vm. The only place where i can think this can be useful is the delay in growing a system on demand? In redundancy you usually dont have your servers down at the same time. Anyway it cant hurt either unless it creates bugs.

          Comment


          • #25
            Originally posted by stormcrow View Post

            It's not an either/or thing. Any given person voicing, displaying, or even indirectly associating with minority groups or political dissent can and will be targeted by governments, NGO surveillance, and criminal gangs.

            No people shouldn't be "more worried over X than Y". People should just be worried and take care to protect their digital lives as well as they can by as much prudence as they're capable. Usually that means leaving nation state level attacks largely to protection by professionals (like the forthcoming iOS lockdown mode), but that doesn't mean people should just throw up their hands in helplessness against government overreach - even if that government is China, the US, the UK, or NK. There's plenty to be done at the individual level to protect against fishing-style attacks which is where the majority of breaches occur even when involving nation state aggressors.
            There is a level of trust. Israel and china are often even partners in crime when israel sells western military secrets to china through israeli spies. I distrust israeli and chinese hardware or software about the same. But unfortanetely every chip seems to have backdoors by western laws. Then you have the big five eyes with PRISM(former ECHELON) spying on the entire internet. Anyway as long as the patch is good it should be good.

            Comment


            • #26
              Originally posted by milkylainen View Post

              I think you've answered your own question?
              But to clarify. Stable (or unstable) reset state != re-running initialization in software.
              So mishaps do happen.
              So you want to say that some drivers take a short-cut and assume some kind of known reset state? Makes sense that way, but seems a bit brittle, since you don't know what the firmware did to the hardware before passing control to the kernel.

              Comment


              • #27
                Originally posted by archkde View Post
                So you want to say that some drivers take a short-cut and assume some kind of known reset state? Makes sense that way, but seems a bit brittle, since you don't know what the firmware did to the hardware before passing control to the kernel.
                They shouldn't intentionally take any short-cuts, but the way they're nearly always used is from a full reboot. So, if you want the simplest and best-tested path, stick with full reboots.

                Comment


                • #28
                  Originally posted by RejectModernity View Post
                  Autism. All kernel patches get checked 10 times before merging. What should you care about is hardware Israeli-American backdoors in your CPU.
                  You telling me what I should care about is blowing the horn of the CCP.

                  I am amazed that so many of you are ready to pick up and run with an argument without having read it because it seems to criticize China. I picked China because the original article was about TikTokTM . Last time I checked this company was from China, and I'm ready to bet that it is still Chinese at the time of writing this post.

                  Apparently only one guy seems to take the point seriously: "Why on earth do they need that?". Most seem to be ok with just about anything "as long as its good":

                  Originally posted by timofonic View Post
                  Anyway, every good contribution is welcome. It's only for servers, but better then nothing.
                  Originally posted by cj.wijtmans View Post
                  ... as long as the patch is good it should be good.
                  May I remind you of Heartbleed. That was also a commit that was checked 10 times before it was merged. Mistakes happen and that's the reason why I think it is important to look at the intent of this "fix". We are speculating here about a possible use case and the comments of the author of the changes (Albert Huangtjie) do not shine a bright light on it. My phantasies go wild thinking of possible malicious uses of this function.

                  Comment


                  • #29
                    Originally posted by lowflyer View Post
                    You telling me what I should care about is blowing the horn of the CCP.

                    I am amazed that so many of you are ready to pick up and run with an argument without having read it because it seems to criticize China. I picked China because the original article was about TikTokTM . Last time I checked this company was from China, and I'm ready to bet that it is still Chinese at the time of writing this post.
                    It's not about whether or not it criticizes China, but whether or not it implies China is any different than any other country in that regard. Note nobody is assuming any more good will from China than they are assuming from western governments or companies. You explicitly said you don't trust it not because you don't understand the use case (which is a very valid reason to not use the feature), but because of where it comes from.
                    My reaction would be the same if you had said you don't trust it because it comes from Intel or Microsoft, and for the latter you can actually check my previous comments in the forum to see I'm honest about it.

                    Originally posted by lowflyer View Post
                    May I remind you of Heartbleed. That was also a commit that was checked 10 times before it was merged.
                    No, it wasn't. There were exactly two people looking at commits for OpenSSL, there was a huge discussion about relying on the voluntary work of a few people for critical infrastructure at the time because of that.

                    Originally posted by lowflyer View Post
                    Mistakes happen and that's the reason why I think it is important to look at the intent of this "fix".
                    So now it's about mistakes. I'll give you the benefit of doubt and say it was just the way you expressed it that made it seem like you were accusing someone of malice just for their provenance, when in reality it was just an assumption about the quality of their programmers.

                    Originally posted by lowflyer View Post
                    We are speculating here about a possible use case and the comments of the author of the changes (Albert Huangtjie) do not shine a bright light on it.
                    The use case is pretty much the same as all other boot time optimizations we've been seeing from western companies such as Amazon and Google. Maybe questionable, but booting machines on the fly is currently valuable, and doing it fast when the actual spikes in usage appear more so.

                    Originally posted by lowflyer View Post
                    My phantasies go wild thinking of possible malicious uses of this function.
                    Care to give an example?

                    Comment


                    • #30
                      Originally posted by coder View Post
                      They shouldn't intentionally take any short-cuts, but the way they're nearly always used is from a full reboot. So, if you want the simplest and best-tested path, stick with full reboots.
                      Why settle for a mediocre solution when you can do as the Bytedance folks have done and make it work to best suite your business needs.

                      Comment

                      Working...
                      X