Announcement

Collapse
No announcement yet.

systemd Clocks In At More Than 1.2 Million Lines

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by aht0 View Post
    Frankly, does not look like a situation I should meet more often than winning a million in lottery. Unless there is gazillion processes, software isn't quite stable and/or I somehow try to mass-kill many processes at once. Is the latter reason causing timeouts when trying to shut down systemd Linux machine? I remember it completely baffled me when I used OpenSUSE awhile a go - boot was actually fine but something during shutdown ALWAYS timed out..
    Those timeouts that systemd does on shutdown is not a defect in systemd.

    The cause of why systemd does timeouts on shutdown and service stop/restart is that it detects due to cgroups that processes are still running. So systemd runs the stop command on a service on shutdown then checks if all processes owning to that service have in fact stopped if not it waits after X amount time it brute force kills it.

    Lets look how sysvinit and the like handle it.
    On shutdown.
    1) sends all the services running stop in order of the kill number.
    2) After all services have been sent stop run a killall to clean house.
    Please note if the last process to get a stop order next operation is the killall by sysvinit there basically no delay between killall clean up coming. Systemd changes this to a predictable delay. Yes this is one of the cause how sysvinit at times screwed over databases delayed by IO for some reason to be killed before final writes were done due to no delay.

    Next is kind of worse.
    You attempt a sysvinit service restart and the service fails to restart. There is a leaked process holding something the service needs to restart. Yes systemd by systemctl doing a restart on something like this you would see a timeout as well and the service be successfully restarted.

    Basically systemd using normal sysvinit scripts if you are seeing timeouts you sysvinit scripts have a defect that needs fixing.

    Systemd turns what was a silent failure on shutdown with sysvinit into a loud message making failure with those timeout messages. That failure does not just effect the services with that problem at shutdown it also effect those services when ever you wish to restart them.

    This is some of the problem. People complain about the shutdown timeout issue of systemd without think maybe systemd is right here might pay to research why. Systemd shutdown timeouts is point you to broken init scripts.

    For the complete time I run systemd I have not had a single shutdown timeout but my init scripts had been audited well before using PID namespaces for process leakage. Process leakage has been insanely common defect causing random issues well before systemd existance.

    Systemd timeout messages on shutdown are just symptom of non systemd defects. Yes defects that sysvinit would silent hide until the day it catches you where you end up reboot a server instead of just restarting a service because a service will not restart.

    Yes these messes is why I say systemd is the init system we had to have. One of the results of systemd was a lot of sysvinit scripts in fact got fixed or deprecated with a systemd unit file that is also fixed.

    Comment


    • Originally posted by tildearrow

      What if the user has no smartphone or another computer to be able to Google? (remember that his computer is borked and can't connect to Internet)
      Sure, let's play the mitigating game. We can come up with ever increasing silly scenarios and how we get out of them.

      This user had access to install media - so they could have live booted it, done the google search and fixed their problem.

      Not knowing is fine (how would we ever learn anything knew?). Being able to figure out what is going on is a real learnt skill - and I understand that too.

      Annoying problem? sure (I'd be annoyed too). But pining for the "Good Old Days (tm)" and forgetting that there nightmares there too is a little on the nose.

      Comment


      • Originally posted by oiaohm View Post

        Those timeouts that systemd does on shutdown is not a defect in systemd.

        The cause of why systemd does timeouts on shutdown and service stop/restart is that it detects due to cgroups that processes are still running. So systemd runs the stop command on a service on shutdown then checks if all processes owning to that service have in fact stopped if not it waits after X amount time it brute force kills it.
        ..

        Systemd timeout messages on shutdown are just symptom of non systemd defects. Yes defects that sysvinit would silent hide until the day it catches you where you end up reboot a server instead of just restarting a service because a service will not restart.

        Yes these messes is why I say systemd is the init system we had to have. One of the results of systemd was a lot of sysvinit scripts in fact got fixed or deprecated with a systemd unit file that is also fixed.
        Diagnose done, after administering the cure and all the problems fixed, we can then happily ditch 1,2M loc diagnostic software? :P
        Just kidding.

        I am not going to try further autopsy on the topic. Just happy to be in non-Linux camp.

        Comment


        • Originally posted by aht0 View Post
          Diagnose done, after administering the cure and all the problems fixed, we can then happily ditch 1,2M loc diagnostic software? :P
          Just kidding.

          I am not going to try further autopsy on the topic. Just happy to be in non-Linux camp.
          LOL this is such a joke of answer. Process leak issues with background services effects OS X, Windows, BSD, Linux ...

          Unless you are using SMF from Solaris or Systemd init at this stage you are lacking the tools to deal detect the defect.

          Really we cannot ever 100 percent ditch the diagnostic software. With cgroup and pidfile stuff fixed up items like openrc could be built and have the required diagnostics to detect this issues early.

          Also the 1.2 million lines of code in systemd is way smaller than freebsd min selfhost. If you are suggesting BSD as option you are a hypocrite because it has a large block of diagnostic software..

          This is the laugh this is the hard reality of systemd its find issues that have been silent faults in background that random-ally hit people. Systemd is also working on the frameworks required to address these silent faults.

          aht0 serous-ally look closely at your current solution how would you find a process leak event and know you are going to need to fix it. Under freebsd you will have to put services in jails to find out. Out the box you will not be informed you have a problem.

          So since aht0 is happy to live with random hard to diagnose problems I cannot stop him/her from being that foolish.

          Comment


          • and here was me thinking this thread would not deliver fireworks since hreindl got banned

            Comment


            • Originally posted by grumbert View Post
              and here was me thinking this thread would not deliver fireworks since hreindl got banned
              It's a thread about systemd. That's all that I should have to say.

              Comment


              • Originally posted by Raka555 View Post
                In the "old days" linux was for power users only - People that understood or wanted to understand how things work.
                Linux was very user friendly, but it was picky about it's users.

                Then came the drive to bring linux to the masses.
                Slowly but surely they took away the character of what was "linux" to make it more "windowsy", so that people with limited capacity can still feel at home.
                The current state of "Linux" is a bunch of bloated applications trying to mimic windows.

                Systemd is just another (albeit huge) step towards taking control away from power users.
                If you like systemd you are not a power user, you are just someone who wants a free OS.

                Sadly the power users have to start looking elsewhere.
                I recommend they look at "Alpine linux" and "Void linux"
                I won't even recommend Devuan since its based on Debian, who has lost its way.

                The world urgently needs a new OS.
                I suspect that if that OS ever sees the light, it will be written in zig.

                The beauty of Linux are all the flavours avaliable to us!
                That's the true spirit.
                More flavours mean more choice, and that is always good.

                Comment


                • Originally posted by oiaohm View Post
                  LOL this is such a joke of answer. (a)Process leak issues with background services effects OS X, Windows, BSD, Linux ...
                  Unless you are using SMF from Solaris or Systemd init at this stage you are lacking the tools to deal detect the defect.
                  Really we cannot ever 100 percent ditch the diagnostic software. With (b)cgroup and pidfile stuff fixed up items like openrc could be built and have the required diagnostics to detect this issues early.
                  (a)Hold your horses for a second! You are basing this claim on what precisely? Have you actually tested it against OS X, Windows XP - 10 RS5, all 4 BSD's, Solaris..? Or it's just an extrapolation based on what init/service manager any of the OS'es in question happens to have. In latter case, it's speculative. Fact that Linux happens to have particular issue does definitely not mean everything else has. None of OSes in question has any historic relation to Linux nor share any relevant code historically.

                  (b)Cgroup is purely Linux-specific implementation. None of the others share this particular way of managing resources.

                  Originally posted by oiaohm View Post
                  (a)Also the 1.2 million lines of code in systemd is way smaller than freebsd min selfhost. If you are suggesting BSD as option you are a hypocrite because (b)it has a large block of diagnostic software..
                  You know, replace your term "self-host" with a "core system". More easily understandable and it's more fitting. You won't see FreeBSD in "pieces" anyway.

                  (a) Are you very sure in your claim? Because there's drastic difference to "FreeBSD installed by default" and "FreeBSD updated by compiling". I'll explain, it's bit lengthy but nothing to do about it..
                  -RELEASE is updated by binary packages - lots like your average Linux distro. What components are included by FreeBSD team you'll get.
                  -STABLE is something like rolling release but wholly compiled. You'd pull in latest code revision, build new world and kernel, do mergemaster -Ui, reboot and voila, upgrade done. Trick is, based on what's written in your /etc/src.conf - your system build may exclude lots of core system components you deem unnecessary.

                  https://www.freebsd.org/cgi/man.cgi?query=src.conf shows you possible options for building upgrades. Based on this, I suspect, IF I wanted to build truly minimal FreeBSD core system, LOC ratio 'FreeBSD vs systemd' would take serious adjustment and not for latter's favor. Current FreeBSD standard install source loc should be around 10 million somewhere. IF I stripped out everything, leaving just FreeBSD kernel, libc and init, I am pretty sure, code base left would be smaller than systemd loc stand-alone

                  But, why are you comparing systemd (stand-alone) vs FreeBSD (complete OS) loc? systemd itself is not stand-alone. It, at minimum, needs kernel as well. And, oh boy, kernel is 25+m loc. Let's be fair. systemd+kernel vs freeBSD base install - latter's loc is almost three times less.

                  When we take comparison further, though it's pointless... lets compare Linux distro vs FreeBSD. Recent Debian distro installed is something like half a billion loc.

                  (b) Partially correct. It has. But are testing/debugging bits installed? Read what I wrote above. It's much like Gentoo or Arch. I can 'massage' the OS in au quite few ways. You don't have to guess what I run. Besides, excluding non-used components from core system reduces build time spent on system upgrades, there's incentive to exclude bits.


                  Originally posted by oiaohm View Post
                  This is the laugh this is the hard reality of systemd its find (a) issues that have been silent faults in background that random-ally hit people. Systemd is also working on the frameworks required to address these silent faults.
                  (a) Issues for Linux kernel, mind you. Which has been developed on it's own. You seem to think that almost all OSes have identical flaws hidden inside them and only systemd can find them..

                  Do you know that BSD pkill and Linux's pkill aren't even strictly same program? Share name, do similar thing, true but authors differ, code both are based on, differs.. All BSD's differ internally. Bold claims.


                  Originally posted by oiaohm View Post
                  aht0 serous-ally look closely at your current solution how would you find a process leak event and know you are going to need to fix it. Under freebsd you will have to put services in jails to find out. Out the box you will not be informed you have a problem.
                  First, I don't really think there is actually such an issue present. I tried searching information about it but came up blank. Just no information. Problem either does not exist, expresses itself extremely rarely && people never bothered writing about it.Kindly provide me with pointers, please.

                  Originally posted by oiaohm View Post
                  So since aht0 is happy to live with random hard to diagnose problems I cannot stop him/her from being that foolish.
                  IF I had problems, I'd look for alternatives
                  And FIY, I am ~40y male.
                  Last edited by aht0; 24 May 2019, 04:25 PM.

                  Comment


                  • Originally posted by aht0 View Post
                    (a)Hold your horses for a second! You are basing this claim on what precisely? Have you actually tested it against OS X, Windows XP - 10 RS5, all 4 BSD's, Solaris..?
                    I have personally tested on all of those. Solaris with SMF can also deal with process leak. Cgroup and Zones deal with process leak. Attempting to use cgroup/zones to deal with pidfile issues does not work out. Sun documented process leak in Solaris as one of the justification for having SMF use Zones. The first platform to deal with the process leak problem is Solaris.

                    Process leak is quite simple. Its some process the service starts that when restart/stop on the service is run is not shutdown. Since this service was not shutdown it can be holding resources that are required that the service can work correctly.

                    Windows XP to 10 the most common form of process leak issue happens with winspool the print server and it drivers.
                    If the Print Spooler service stopped working on your computer, here are some potential fixes to solve this problem on all Windows versions.

                    Person can run through this complete list and still be having random printer problems. The process leak problem temporary goes away when ever they reboot the machine. Rebooting windows to fix problems is classed as normal behaviour and they don't bother bug reporting it any more.

                    Interesting enough different third party printer drivers for cups with OS X and the BSD also process leak. If you don't mind percentage of your users having random printer issues where they have to reboot their systems to fix don't deal with this problem. BSD based solutions could move to always putting services in individual jails so become able to detect process leak.

                    Originally posted by aht0 View Post
                    First, I don't really think there is actually such an issue present. I tried searching information about it but came up blank. Just no information. Problem either does not exist, expresses itself extremely rarely && people never bothered writing about it.Kindly provide me with pointers, please.
                    This is basically wrong. The problem expresses is self a lot. People understand how to detect a memory leak. People do not normally understand the processes to detect process leak.

                    Every one scripts that you saw systemd timing out on had a process leak. You will find the makers of those programs would have been using the same service shutdown process on BSD and had a process leak over there as well.

                    Process leak is fairly straight forward to check for. cgroup/zones make it lighter.

                    Process to find a process leak.
                    1) Set up means to keep track of all processes a service/s start. Systemd cgroup and Solaris Zones do this role. Ubuntu upstart attempted ptrace for this role and it did not work out.
                    2) start the service.
                    3) Expose to normal workload.
                    4) stop service.
                    5) check if any processes the service started are still running if there are still running you have a process leak. This is when systemd starts printing the timeout error and SMF prints Invalid service shutdown waiting X before terminating Y.

                    Please note point 3 just straight up starting and stopping the service may not show the process leak even that the service will process leak. Running though everything a normal workload will do cannot be done by the developer. So this test need to be performed by end users.

                    You get benign process leaks that all they do is consume a PID number and some memory so basically a different form of memory leak.
                    But you also get harmful process leaks that have locked resources that stop you from restarting service correctly this is like the winspool and cups that result in not being able to print even after restarting print server. But its not only print services this has happen with database services, web services and so on. Anything that executes sub programs as a service can lose track of what it started.

                    Process leaks is one of these background problem that for longer than Linux has existed people have not been properly running diagnostics for. Remember SMF in Solaris using Zones with SMF was the first to go after this problem. We do need to learn from this and improve our service management systems not to ignore this problem.

                    The systemd timeout message could be a lot more bunt and clear what the problem is.

                    Comment


                    • Originally posted by oiaohm View Post
                      5) check if any processes the service started are still running if there are still running you have a process leak. This is when systemd starts printing the timeout error and SMF prints Invalid service shutdown waiting X before terminating Y.
                      depinit (written by richard lightman) tackled and solved this issue in a much cleaner way. firstly, daemons spawned as background tasks were treated as "legacy": the default preferred option was (is) to run them as foreground processes, and depinit had the option to farm the stdin, stdout *and* stderr separately to *separate* (dependent) services. safe_mysqld, as with all daemon-spawning, was therefore entirely unnecessary and redundant.

                      secondly, depinit captured *all* signals, and i do mean *all*. nothing escaped, except unfortunately if testing depinit recursively (yes, this was possible), the linux kernel rather unsportingly sends all daemon-spawned signals (of orphaned processes) down to PID1, unconditionally.

                      depinit was (when run as PID1) capable of hunting down even fork-bombs and malicious viruses, by deploying ever-increasingly-aggressive hunt-and-kill strategies that were SLOWLY escalated.

                      basically, it was extremely well-engineered, and was only a few K-lines of code. systemd is unconscionable, end of discussion, and its carte-blanche deployment, IN DIRECT VIOLATION OF A DEBIAN VOTE, will continue to haunt us until distros give *all* users the FULL right not to be forced into using it.


                      Comment

                      Working...
                      X