Announcement

Collapse
No announcement yet.

Debian Developer Resigns From The Systemd Maintainership Team

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by oleid View Post
    Systemd's watchdog has absolutely nothing to do with hardware watchdogs. The daemon has to update a timestamp in the main loop, which systemd checks. Thus, if the daemon is stuck in an infinite loop, the timestamp won't get updated.
    ah, so the program must be hacked for that feature to work
    instead of... you know... the program doing it itself
    note that a program can itself handle SIGSEGV and almost all other signals, except for SIGKILL
    (note on the note: firefox and enlightenment do it)
    it can also reexec itself if it can't recover itself

    and it can still fail to deliver the website (for example the hypothetical "wont accept connection" bug)

    also why is it called "watchdog" if it has nothing to do with the watchdog mechanism
    (hint: i don't care, its stupid)

    Comment


    • Originally posted by gens View Post
      ah, so the program must be hacked for that feature to work
      instead of... you know... the program doing it itself
      How would a hung (as in stuck) service restart itself? That's the idea of a service manager.

      Originally posted by gens View Post
      note that a program can itself handle SIGSEGV and almost all other signals, except for SIGKILL
      (note on the note: firefox and enlightenment do it)
      Sure, I use signal handling in my own code.

      You can handle SIGSEGV in the service, then it won't have to be restarted by the service manager.
      If you don't want to do this on your own, the service manager can do it for you.

      In your case, you'd probably want to send SIGTERM to the service, if wget fails. And then, provide an implementation for SIGTERM to restart the service. But I guess adding one line of code (two, if you count the include directive for the header) is simpler than providing an implementation for SIGTERM and SIGSEGV for *every* service you want to be self-restarting. Especially if you need to keep track of how often the service was restarted before, since your SIGSEGV handler might get stuck in an infinite loop, if the crash occurs directly after service start. That's exactly why such a complexity should be outsourced.


      Originally posted by gens View Post
      and it can still fail to deliver the website (for example the hypothetical "wont accept connection" bug)
      As the bug is hypothetical, it's hard to speculate what causes it. It probably depends on where the sd_notify call is inserted, if it can detect the error.

      Originally posted by gens View Post
      also why is it called "watchdog" if it has nothing to do with the watchdog mechanism
      (hint: i don't care, its stupid)
      It's a software watchdog.


      Edit:
      Oh, before you start asking: No, there is no runtime dependency on systemd to the service, only a compile time dependency. And as it's only less than a hand full lines of code, it can easily be #ifdef'ed to even prevent the compile time dependency. Furthermore, sd_notify simply sends a signal via DBUS, so any service manager can listen to it.
      Last edited by oleid; 18 November 2014, 05:15 PM.

      Comment


      • Originally posted by oleid View Post
        How would a hung (as in stuck) service restart itself? That's the idea of a service manager.
        and the whole point of all this is to show that there are things a "service manager" just can not know are happening
        but a special 4 line script (or a very small C program, whatever) can

        if a "service manager" would do such things, it would need a special function for EVERY "service" ever written

        on the other hand if the people writing these programs would think that it was a good idea they would just put it in their program
        init system / service manager / whatever independent and simple (it would even work on windows)

        Comment


        • Originally posted by gens View Post
          and the whole point of all this is to show that there are things a "service manager" just can not know are happening
          but a special 4 line script (or a very small C program, whatever) can
          No, you didn't get the point. sd_notify can, if correctly placed, inform the service manager if the service is still alive and kicking. Two lines of extra code. Correctly placed e.g. in the mainloop, which answers the socket connections and (e.g.) forks of the worker processes.

          Originally posted by gens View Post
          if a "service manager" would do such things, it would need a special function for EVERY "service" ever written
          Not if using the generic sd_notify.

          Originally posted by gens View Post
          on the other hand if the people writing these programs would think that it was a good idea they would just put it in their program
          init system / service manager / whatever independent and simple (it would even work on windows)
          Signal handlers won't work on windows -- at least not the way you implement them on POSIX.

          According to http://stackoverflow.com/questions/3...lt-under-linux the cleanest way to make your daemon self-restartable is to create a kind of hypervisor process on your own. But isn't that exactly, what a generic service manager is for?


          If you don't trust the software watchdog, nobody stops you from additionally calling your wget-shellscript from e.g. cron and simply send SIGTERM to the service and let it be auto-restarted by systemd.

          I guess you surely can construct a case which can't be detected by the software watchdog, but can be detected by your wget-script, but I doubt it's the kind of error you find out in the wild. That's what service testsuites are for, but it's way beyond a simple service manager.
          Last edited by oleid; 18 November 2014, 06:06 PM.

          Comment


          • Originally posted by oleid View Post
            No, you didn't get the point. sd_notify can, if correctly placed, inform the service manager if the service is still alive and kicking. Two lines of extra code. Correctly placed e.g. in the mainloop, which answers the socket connections and (e.g.) forks of the worker processes.

            ...

            I guess you surely can construct a case which can't be detected by the software watchdog, but can be detected by your wget-script, but I doubt it's the kind of error you find out in the wild. That's what service testsuites are for, but it's way beyond a simple service manager.
            ...
            ......
            idk what to say
            a modern server program does most of the work in worker threads, meaning that the main loop would work fine
            it can do work but send a 500 HTTP message
            it can work absolutely normally but due to some router or caching node not working properly not provide the service that would be expected from a server
            it can have thread management problems and thus send garbage to the client (modern servers do their own memory management instead of relying on sendfile())
            and many more that i can't think of (as it is with bugs)

            you don't make assumptions
            you don't debate about "opinions"
            it either works or it doesn't
            if you want to know if there is power in a socket, you take this and stick it in
            there is no philosophy behind it

            a simple check like the one you speak of, that can be done by the program itself (from a clone() or fork()), is not 100% accurate
            if you point firefox at the webpage, that is
            if you point curl/wget at a webpage you get the same certainty but without doing it yourself
            (ofc from a computer outside of the local network)

            as for desktop processes, where these things arn't as important
            if something like firefox hangs, you will notice
            if your window manager fails, you will notice
            if you tell your window manager to close a window that is hanging it will tell you that it is not responding and give you an option to kill it
            and so on

            now go tell someone else that they don't get the point

            Comment


            • Originally posted by gens View Post
              ...
              it is the difference between a banana written as "baeana" and as "apple"
              if a binary log is corrupt you have to rely on the tool to decode the rest of it properly
              if a ASCII text log is corrupt all you have to do is pass it through strings (or use a text editor that won't go nuts on unprintable characters, and that it most of them on many platforms)
              But text part of the binary log is written as ASCII text, so is there any actual difference?

              Comment


              • Originally posted by gens View Post
                a modern server program does most of the work in worker threads, meaning that the main loop would work fine
                This was merely a suggestion, I don't have a testcase of what gets stuck and I didn't write the hypothetical websever, thus I don't exactly know where to put it.

                Originally posted by gens View Post
                ...
                a simple check like the one you speak of, that can be done by the program itself (from a clone() or fork()), is not 100% accurate
                if you point firefox at the webpage, that is
                if you point curl/wget at a webpage you get the same certainty but without doing it yourself
                (ofc from a computer outside of the local network)
                Sure, if a valid HTTP header is sended, wget helps. You're wget won't notice if you get a 200 and simply a blank page or maybe some garbage, which was loaded from any memory area.

                But that is not, what you where talking about. We talked about infinite loops. Of curse a watchdog can't help here if there is no infinite loop. The sort of 500-ish errors could be also extracted from the journal (it marks the priority of a message, such as ERROR, WARNING etc), as such errors typically get logged. If you change the subject while discussion without notification, a discussion is pointless. No wonder people think you don't get the point.
                Last edited by oleid; 18 November 2014, 06:55 PM.

                Comment


                • Originally posted by erendorn View Post
                  But text part of the binary log is written as ASCII text, so is there any actual difference?
                  yes, many differences

                  take for example the name of the program that sent the log msg
                  it is written once and given an "id"
                  an id is just a number and in the rest of the log it is used instead of the name
                  so if one of those entries has the wrong id, the whole line is more or less meaningless
                  the rest is similar

                  it's kind of like simple looseless compression in that every string that repeats itself is replaced by an index
                  so worst case that index gets corrupted and the whole log is worthless

                  reasoning for a binary log was faster indexing, that was shown to not be entirely true
                  also who gives a f about how fast a log is parsed (grep can parse thousands of lines a sec)

                  Comment


                  • Originally posted by oleid View Post
                    This was merely a suggestion, I don't have a testcase of what gets stuck and I didn't write the hypothetical websever, thus I don't exactly know where to put it.
                    i just realized this morning
                    you are suggesting a systemd specific mechanism (sd_notify) that uses the OO dbus with the reasoning that it is just a couple lines of code
                    thus making it dependent on having two things running that you don't need on a specialized server (dbus and a compliant process tracker aka systemd)
                    and a library (or two)

                    while the "program does it itself" solution is also short (cca 7-10 simple lines of C) and is not just init/process tracker independent but also OS independent and needs only the kernel

                    not to even go in the discussion that process tracking does not necessarily need to be part of the init
                    (i'm making a process tracker, for fun)


                    let's have another quote on complicating things:
                    "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."
                    -Albert Einstein


                    edit: don't get me wrong
                    it would work the same, it's just overall way more complicated
                    Last edited by gens; 19 November 2014, 08:41 AM.

                    Comment


                    • Originally posted by gens View Post
                      i just realized this morning
                      you are suggesting a systemd specific mechanism (sd_notify) that uses the OO dbus with the reasoning that it is just a couple lines of code
                      thus making it dependent on having two things running that you don't need on a specialized server (dbus and a compliant process tracker aka systemd)
                      and a library (or two)
                      Sure, you'd need DBUS and the process tracker. But since dbus is now even part of the kernel, I wouldn't count that as a huge dependency.


                      Originally posted by gens View Post
                      while the "program does it itself" solution is also short (cca 7-10 simple lines of C) and is not just init/process tracker independent but also OS independent and needs only the kernel
                      I'd doubt, that it's only 10 lines of C code. You'd need to handle a few cases here to get it right. Maybe it would make sence to put process restarting into a shared library, I guess. As it really wouldn't make sense to repeat mostly the same code over and over again in every daemon out there. But then, you can put this code into a process hypervisor, if you have it anyway.

                      Originally posted by gens View Post
                      not to even go in the discussion that process tracking does not necessarily need to be part of the init
                      (i'm making a process tracker, for fun)
                      I'd love to see the code as I find this topic interesting.

                      Originally posted by gens View Post
                      let's have another quote on complicating things:
                      "Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction."
                      -Albert Einstein
                      I'm all for simple solutions. If somebody comes up with a simpler solution than systemd, that solves the same problems, I'm all for it.

                      Comment

                      Working...
                      X