Announcement

Collapse
No announcement yet.

New /proc/pid/kill Interface Proposed For Killing Linux Processes

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #21
    Originally posted by Emmanuel Deloget View Post

    kill (be it the program, the shell builtin or the syscall) has always been able to send any signal to a target process or group of processes. That might be a good reason why the name stuck
    Nope, the kill syscall did just kill the process back in Unix V3 https://minnie.tuhs.org//cgi-bin/utr...an/man2/kill.2 it wasn't until V4 that it got the additional capability to send other signals https://minnie.tuhs.org//cgi-bin/utr...an/man2/kill.2 at which point they also added the signal syscall to register a signal handler.

    And when POSIX was codified they kept the name due to:

    There is some belief that the name kill() is misleading, since the function is not always intended to cause process termination. However, the name is common to all historical implementations, and any change would be in conflict with the goal of minimal changes to existing application code.

    Comment


    • #22
      Originally posted by stibium View Post
      How is this race-condition-free, again? If you are sending a signal from a script using this interface it's literally no different than using the kill command. Correct me if I'm wrong, but the only way to make for certain the signal gets to the correct process is to send it to the file handle, as others have mentioned.

      A better (or perhaps equally dumb and useless) idea would be to have PID hashes so you could just reference the hash so you know you are killing the correct process.
      It's race condition free in the context of inspecting the /proc/pid/ details and then killing the process since you touch an already open file descriptor. So if you do a simple "touch /proc/121/kill" then yes you are susceptible to a a race condition but if you open /proc/121 and then inspect the various data in that tree and then decide to kill it you are now safe since the kernel will maintain the process/file-descriptor link.

      Comment


      • #23
        Originally posted by stibium View Post
        How is this race-condition-free, again? If you are sending a signal from a script using this interface it's literally no different than using the kill command. Correct me if I'm wrong, but the only way to make for certain the signal gets to the correct process is to send it to the file handle, as others have mentioned.

        A better (or perhaps equally dumb and useless) idea would be to have PID hashes so you could just reference the hash so you know you are killing the correct process.
        I'm not certain about this but...

        You could change the working directory into the /proc/PID/ directory. From within that directory you can confirm you are dealing with the process you expect (e.g. by examining the cmdline entry). Now you can kill the process by sending a signal to the 'kill' file within that directory by referencing it by the relative path. Even if the process is killed and a new process is created with the same PID before you send the kill signal, you shouldn't get a problem.

        This isn't quite right, but should illustrate the idea I'm thinking of:

        Terminal 1: bash:
        Code:
        cd ~
        mkdir tmp
        cd tmp
        ls -id .
        Terminal 2: bash:
        Code:
        cd ~
        rmdir tmp
        mkdir tmp
        cd tmp
        ls -id .
        Terminal 1: bash:
        Code:
        ls -id .
        # Note the same inode that was previously printed in terminal 1 is printed again.
        # This bash instance has remained inside the deleted tmp dir.
        # Now imagine if this tmp dir was a procfs process dir for a process that got killed
        # before you issued your kill signal.

        See what I'm thinking of?
        Last edited by cybertraveler; 30 October 2018, 08:41 PM.

        Comment


        • #24
          im a total beginner but i must say i have learned about processes in this thread. i thought one just uses killall, but that kills the whole command, which might be undesirabre if your cmd has several processes runnnig.

          good stuff

          Comment


          • #25
            Originally posted by Weasel View Post
            Why fuck them? They can't learn the word "kill" or what?
            Because the name "kill" covers only ONE of the signals you can send with that, and does not convey that it can actually send more than one signal.

            Calling it some random fantasy name like "Weasel" and leave all it does in the documentation and require people to rtm would have been less misleading.

            Comment


            • #26
              Really this is something historically stupid.

              Yes the "Everything is a file" is a really old historic Unix statement. Except by posix everything is a file except processes id, user id, group ids.....

              Fairly much every where you find these unique ID values in posix you fairly much have a location where you can race condition.

              There is a bit of different debate on how safe kill should be done. One idea is /proc/pid comes the file handle. You open that file handle then send signals to it. Of course if this solution comes the normal still means all your sysvinit scripts will still be busted.

              Really it will be useful if it generic because this would allow using more of the ipc messaging systems safely. Yes one of the problems of sending posix signals to another application is not being sure that you have in fact sent the signal to the one you intended.



              Please note the PID problem is little large as you notice for your signal handling your response is also using race able PID. The reality is if this PID problem can fixed there could be a generic message system to talk between applications added using extension to posix signals. Do notice the structure contains a file handle. So having it pass blocks of memory by file handle between applications should be totally possible if the PID problem can be sorted out. Of course for security its then sorting out the UID changed security settings problem.

              Comment


              • #27
                F.Ultra is right. Here's a simplified example of what can happen in theory. You have an app that scans a process to kill. Let's say it looks for its cmdline for simplicity.

                Possible sequence of events (race condition):
                • App finds process pid 42 with specific cmdline. It hasn't sent kill yet, so far it has only scanned.
                • Another app sent the kill signal and killed process pid 42 while your app was inspecting its cmdline.
                • Kernel happens to spawn a new process with pid 42 while first app is still scanning the cmdline (maybe it's a super slow scan?).
                • Your app now confirmed it should kill process pid 42. It sends signal to pid 42, which now kills the newly spawned process. RIP.


                With new method:
                • App finds process pid 42 with specific cmdline, but it still needs to scan it. It hasn't sent kill yet, so far it has only scanned a bit. It keeps an open file descriptor to /proc/42 at least.
                • Another app sent the kill signal and killed process pid 42 while your app was inspecting its cmdline. Due to how Unix works, the fd is still valid but it can't be accessed via filename anymore.
                • Kernel happens to spawn a new process with pid 42 while first app is still scanning the cmdline (maybe it's a super slow scan?).
                • Your app now confirmed it should kill process pid 42. However, it opens "kill" using its open file descriptor and sends the signal, which does nothing because it was already killed (or fails to open it). No race condition.
                That said the likelihood that a pid will be reused so fast is extremely, extremely low, so it's more academic than useful in practice...

                Comment


                • #28
                  Originally posted by Weasel View Post
                  That said the likelihood that a pid will be reused so fast is extremely, extremely low, so it's more academic than useful in practice...
                  Depends on the server. It way worse on 32 bit linux systems with a default of 32767 PID numbers. 64 bit is 4194303 PID.

                  32 bit is really bad valve. Particular when you consider apache can be burning through your PID allocations at over 100000 a hour. PID count of 32767 if in theory nothing was used that passing each PID every 20 mins. Reality due to some of those PID numbers remaining allocated you are looking closer to every 10 to 15 mins on a busy server. So on 32 bit systems with massive PID consuming work loads its hell.

                  Yes the other work loads that are hell like this are build servers. 64 bit 4194303 PID count can be chewed through every hour on the hour in massive cored systems.

                  On a desktop or lightly loaded servers hitting this race condition is rare. On a 32 bit system rolling dice ever 10-15 mins will get you.

                  Originally posted by Weasel View Post
                  Kernel happens to spawn a new process with pid 42 while first app is still scanning the cmdline (maybe it's a super slow scan?).
                  What will normally be the cause is something with real-time/higher priority than the scan what is a lower priority. Your scan was preempted and a high priority process of something important is started in the recently dead pid number .

                  Yes the race condition is rare but when you hit the race the effected process going to be something you have given higher priority to that will most likely be a item you really don't want killed.

                  I would say is close to academic for lightly loaded servers/desktops. For something like google, facebook, massive CI servers(all items using a lot of processes/PID numbers) its something that is going to be disrupting things about once every 10 years per servers. And when you are running 100 thousand plus of servers you now have something that is quite a few times a day all over the place if you do nothing to contain the race condition like using sysvinit..

                  This PID race condition is why was facebook and other so interested in systemd. The cgroup/namespace around the service also prevents killing other services PID/processes by mistake. A kill inside a PID namespace can only kill PID inside that namespace. So there is already a half fix to this problem.

                  Of course fixing this up properly allows finer control.

                  Comment


                  • #29
                    Originally posted by cybertraveler View Post

                    I'm not certain about this but...

                    You could change the working directory into the /proc/PID/ directory. From within that directory you can confirm you are dealing with the process you expect (e.g. by examining the cmdline entry). Now you can kill the process by sending a signal to the 'kill' file within that directory by referencing it by the relative path. Even if the process is killed and a new process is created with the same PID before you send the kill signal, you shouldn't get a problem.

                    This isn't quite right, but should illustrate the idea I'm thinking of:

                    <snip>

                    See what I'm thinking of?
                    I do. Decent, simple solution, although it may be shell-specific whether or not this would work in practice. Thinking about it I believe it would be best to just work with the fd directly, which essentially functions as a hash anyway. My bash-fu just isn't on the level where I know how to do that without looking it up.

                    Comment


                    • #30
                      Originally posted by starshipeleven View Post
                      Because the name "kill" covers only ONE of the signals you can send with that, and does not convey that it can actually send more than one signal.

                      Calling it some random fantasy name like "Weasel" and leave all it does in the documentation and require people to rtm would have been less misleading.
                      This. It should be called "signal" or something generic. In fact, the patch should probably be rejected or given a healthy dose of RFC criticism considering the author doesn't seem to understand that distinction.

                      Comment

                      Working...
                      X