Announcement

Collapse
No announcement yet.

Trimming systemd Halved The Boot Time On A PocketBeagle ARM Linux Board

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #51
    Originally posted by GrayShade View Post
    I'm sure some users need it, but for me it doesn't hang without pulling the cable. That option only tells it to unmount the filesystem after stopping the network. It has no effect if the server is down, sure the network will still be running, but the share is inaccessible.
    Would you mind trying it anyway?

    That said, I also have in my mount options
    x-systemd.idle-timeout=180

    that tells it to unmount if idle for more than 180 seconds, maybe that helps you a bit workaround your case? I think that if there is no communication to the server it counts as idle?

    https://www.freedesktop.org/software...emd.mount.html

    It could use the job stop timeout or anything else. But there's no configuration for that, just a compile-time value of 90 seconds.
    I think their reasoning is that filesystem syncing on shutdown is a too important thing to let random scrubs change it to workaround other issues and end up shooting themselves in the foot. Because you know people will try to do that if you let them. And then end with an unmountable rootfs some months later.

    Or at least that's usually why something is a compile option instead of a config. It's a restricted, "developer-only" option.

    In this case, I think it would make more sense to go on Github and either report or contribute a fix for the actual issue you have.

    I mean finding out why it's hanging on shutdown only when you pull the cable and the x-systemd.requires=network-online.target isn't working for you.

    It's not the first time that people on Arch slam their face on a new exciting systemd bug while people on other rolling distros like Tumbleweed and Gentoo are not using the latest and greatest systemd version (for this exact reason).
    Last edited by starshipeleven; 29 October 2019, 07:07 PM.

    Comment


    • #52
      Originally posted by starshipeleven View Post
      Only way you can trigger that is because some application refuses to stop and systemd then waits.
      Seriously? I think I remember what may have caused the issue, but, like, xmesg? Is xmesg broken enough to hinder the shutdown process?
      Doesn't it die when the display manager is disabled?

      Originally posted by starshipeleven View Post
      The default timeout does not trigger unless some application hang that does not terminate gracefully, so systemd waits a while before terminating it by force.
      See above.

      Originally posted by starshipeleven View Post
      Most systems don't have hung applications and shut down normally. If something happens it's a one-off and I personally appreciate that I get notified that something I used recently broke in some way.
      I would appreciate to be able to diagnose the root cause too! I have tried with journalctl, but did't find anything...

      Originally posted by starshipeleven View Post
      If your system has something that hangs routinely, then you need to adjust the default config to deal with your own non-standard usecase.
      Nope. The shutdown hangs are random.

      Originally posted by starshipeleven View Post
      People have been complaining about the 90 seconds to shut down for many years already. Ubuntu/Debian, OpenSUSE and Fedora did that since a long while ago, hence "many distros" as that's a good chunk of the ones available.

      I don't know if this is upstream default or not but afaik is more or less de-facto default.
      Yes, that is for the 90 second DefaultTimeoutStopSec. Not for [email protected]'s TimeoutStopSec. They only added the unit-specific TSS a few months ago in some systemd version.

      Originally posted by starshipeleven View Post
      I'm just reacting to your rant with my own rant about people that fucking blame the messenger about their issues, and then blame the system when it's just a configuration issue they could have fixed themselves with 10 seconds of googling.
      You know, I worked around the issue myself even before posting it here, without searching for it or anything. This happened ~2 days ago and only yesterday I worked around it.

      Originally posted by starshipeleven View Post
      I shout at people bullshitting Windows too, don't worry. Boy do I shout at people on Windows too...
      I know this well.

      Originally posted by starshipeleven View Post
      Might be a good idea to switch distro to something else that does not require you to RTFM so often then?

      Arch users will never admit it, but it is a higher-than-average maintenance distro.
      I know this well.
      No, I'm not switching yet... I can sort of deal with this.

      Comment


      • #53
        Originally posted by GrayShade View Post

        Arch, CIFS shares. Waits for them 90 seconds before rebooting, even with the tweak I suggested to tildearrow , which is expected anyway because it applied to a user session, while in my case it's waiting for a fstab mount.

        Reducing DefaultTimeoutStopSec has no effect, it still waits 90-ish seconds, even though that's the only 90 seconds timeout in /etc/systemd/system.conf. What bugs me even more is that I can't restart it with Ctrl-Alt-Del: it detects that I pressed it, says "forcibly rebooting" then goes on to wait for another minute and a half or so.

        Tested by pulling the cable out of the CIFS server.
        90 seconds is OK for remote filesystems though. But not for user@...

        Comment


        • #54
          Originally posted by tildearrow View Post
          90 seconds is OK for remote filesystems though. But not for user@...
          To each their own. I've seen the unmount hang quite a couple of times, but never a user session one. Does the override I mentioned help?

          Originally posted by starshipeleven
          Arch users will never admit it, but it is a higher-than-average maintenance distro.
          YMMV. I've had broken systems on upgrades on Ubuntu, but not on Arch.

          Originally posted by starshipeleven
          x-systemd.idle-timeout=180
          Sure, that's a good idea, but doesn't help in the case I described (mounting the shares and restarting immediately).

          Originally posted by starshipeleven
          I mean finding out why it's hanging on shutdown only when you pull the cable and the x-systemd.requires=network-online.target isn't working for you.
          It's not working because it only adds Requires and After dependencies on network-online.target. If the server is offline, the order my systemd stops things in doesn't matter at all.

          Originally posted by starshipeleven
          It's not the first time that people on Arch slam their face on a new exciting systemd bug while people on other rolling distros like Tumbleweed and Gentoo are not using the latest and greatest systemd version (for this exact reason).
          Don't blame the distro, that code is unchanged since it was added in 2017 in the pull request you've linked to. It uses a hardcoded timeout, probably not because it's too important (really, I can set my memory limit to 2 MB, but can't reduce the unmount timeout because it's too important?) but because that's how somebody implemented it.

          There already is a GitHub issue filed for it.
          Last edited by GrayShade; 30 October 2019, 02:39 AM.

          Comment


          • #55
            GrayShade I had a few Manjaro systems break... Because they were not updated regularly or someone manually held back a package.

            And one thing which is an utter nightmare is trying to use OpenEmbedded on Manjaro. VMs to the rescue. But they do list validated distros in the manual to be fair.

            Comment


            • #56
              Originally posted by Volta View Post

              Yes, it was a nightmare. Furthermore, sometimes services failed to boot for some SysV init only known reason and those ugly scripts.
              Incorrect. SysVinit is light and fast, it's like 50-100lines of C doing exactly what PID1 has todo.

              The issue with booting was always sysVrc, the collection of scripts written by RH to start services and they are bad at sh and that was slow. You can replace sysVrc and have a very fast booting system with modern service management and this is EXACTLY the default behaviour of OpenRC and init system that runs circles around systemD, especially in speed (now you can use openrc-init which is like 150lines doing just what it has to)

              So how can sysvinit be the problem when it is fast

              Comment


              • #57
                Originally posted by Naib View Post
                Incorrect. SysVinit is light and fast, it's like 50-100lines of C doing exactly what PID1 has todo.
                Sysvinit is extremely dumb without init scripts, and lacks most stuff a modern init does (even OpenRC adds a ton of stuff like process tracking and such)
                If you want an example of light and fast but still decent and modern init look at Procd, OpenWrt's init system.

                So how can sysvinit be the problem when it is fast
                it can be when to do anything more than "start all applications in this folder in alphabetical order" you rely on scripts.

                Comment


                • #58
                  Originally posted by GrayShade View Post
                  Sure, that's a good idea, but doesn't help in the case I described (mounting the shares and restarting immediately).
                  That's why I said workaround. At first I thought you were trolling about that ancient systemd issue, but what you say seems more like a new issue.

                  Don't blame the distro, that code is unchanged since it was added in 2017 in the pull request you've linked to.
                  I mean, I can't reproduce your issue, for me it hangs in any case if I don't have the requires network-online.target, mounted or not.
                  That's why I'm not so sure the bug is in that code.

                  It uses a hardcoded timeout, probably not because it's too important (really, I can set my memory limit to 2 MB, but can't reduce the unmount timeout because it's too important?)
                  FYI: setting a memory limit to a wrong value for an userland application does not leave you with a broken rootfs partition because it was shut down uncleanly.

                  We are talking hard brick vs soft brick.

                  Aka broken fs (risk of data loss) vs broken config (and you can fix this mounting the partition from a liveCD and changing the config)

                  There already is a GitHub issue filed for it.
                  Where? I would like to check that.

                  Comment


                  • #59
                    Originally posted by starshipeleven View Post
                    That's why I said workaround. At first I thought you were trolling about that ancient systemd issue, but what you say seems more like a new issue.
                    Fair enough. I think I'll use that setting, it can't hurt.

                    Originally posted by starshipeleven View Post
                    I mean, I can't reproduce your issue, for me it hangs in any case if I don't have the requires network-online.target, mounted or not.
                    That's why I'm not so sure the bug is in that code.
                    Did you try pulling out the network cable from your NAS?

                    Originally posted by starshipeleven View Post
                    FYI: setting a memory limit to a wrong value for an userland application does not leave you with a broken rootfs partition because it was shut down uncleanly.

                    We are talking hard brick vs soft brick.

                    Aka broken fs (risk of data loss) vs broken config (and you can fix this mounting the partition from a liveCD and changing the config)
                    I suppose so, but if your root fs doesn't unmount in 30 seconds, I don't think waiting an extra minute will do much.

                    Originally posted by starshipeleven View Post
                    Where? I would like to check that.
                    systemd version the issue has been seen with: 243.78 Used distribution: Arch Linux Expected behaviour you didn't see: Unmount jobs on shutdown respect DefaultTimeoutStopSec and pressing Ctrl-Alt-De...

                    Comment


                    • #60
                      Originally posted by tildearrow View Post
                      Seriously? I think I remember what may have caused the issue, but, like, xmesg? Is xmesg broken enough to hinder the shutdown process?
                      Doesn't it die when the display manager is disabled?
                      I don't even know what a "xmesg" is, but if the application does not terminate with a SIGTERM then systemd will wait for the timeout before sending a SIGKILL to shut it down by force.

                      Personally, all times it hanged on me it was parted process spawned by gparted that hung because it encountered issues on drives and I forgot to kill it manually (I commonly connect sketchy drives for diagnostic and such, not my main drives), or other processes I launched from terminal and hung for unknown reasons, and then I could not kill them even as root, and only a reboot and waiting for the timeout could kill them.

                      I would appreciate to be able to diagnose the root cause too! I have tried with journalctl, but did't find anything...
                      Heh, that's because distros aren't using systemd service/units for each desktop application, so systemd does not track them singularly like it does with system services.
                      Any application you start as a user will log its console output in the [email protected] and if it hangs or has issues will lock that service. I don't know of any way to find out what user application is without doing some shenanigans or troubleshooting.

                      Afaik with flatpak each application is run separately through systemd (they usually use systemd-run to start the application with the right sandboxing and such https://www.freedesktop.org/software...stemd-run.html ) so you can see in logs what is the offending process.

                      One of the reasons I always say Linux Desktop is just a hack of Server Linux.

                      Comment

                      Working...
                      X