Tuesday, February 17, 2015

Trap killing signals inside Docker



When you stop a Docker container with:

sudo docker stop containername

You are sending SIGTERM signal to Docker, or more precisely the PID 1 process running inside the container. PID 1 process is the one that you ask the container to run when it starts up, it can be the ENTRYPOINT/CMD command defined in the Docker file, or the process passed into docker run

If PID 1 process spawns other processes, in order to shut down those processes gracefully, the PID 1 process has to trap killing signals and pass them on to other processes.

Writing software is a very good way to train my brain, it has taught me how to do system thinking, how to go slow to go fast. I originally started to tackle this problem using Docker, after a few hours of futile trying, I gave it up. The next day, I designed small programs, and slowly, but steadily I solved the problem.

First I wrote a small HelloWorld java application:

public class HelloWorld {
       private static volatile boolean keepRunning = true;

       @Parameter(names = { "-msg" })
       private static String msg = "world";

       public static void main(String[] args) throws Exception {

             final HelloWorld hello = new HelloWorld();
             new JCommander(hello, args);

             System.out.println("hello " + msg);

             Runtime.getRuntime().addShutdownHook(new Thread() {
                    public void run() {
             //wait a while so you can more clearly how things happen in timeline
                          try {
                                 Thread.sleep(50000);
                          } catch (InterruptedException e) {
                                 e.printStackTrace();
                          }

                          try {
                                 System.out.println("goodbye " + msg);
                          } catch (Exception e) {

                          }
                          keepRunning = false;

                    }
             });

             while (keepRunning) {
                    Thread.sleep(50000000);
             }

       }
}

The java application prints “hello world” when it starts, and hang until it receives a shutdown signal.  When it receives the signal, it prints “goodbye world” and exit. 

hello0.sh starts the java application:

#!/bin/bash
CLASSPATH="HelloWorld.jar"
java -cp "$CLASSPATH" HelloWorld ${@} &


Hello shell spawns the java application process. Killing the shell process doesn’t kill the java process.

If we just want to kill the java process when the shell is killed, one simple solution is to use exec to invoke the java program, here comes hello1.sh:

#!/bin/bash
CLASSPATH="HelloWorld.jar"
exec java -cp "$CLASSPATH" HelloWorld ${@}


When using exec, the Java process takes over the life of the shell process, meaning, the java process replaces the shell process, no new PID is created.


The same effect can also be achieved by running the java application in the background:

java -cp "$CLASSPATH" HelloWorld ${@} &

But there is an important difference, with exec, the java process replaces the shell process, no new PID is created; when running the java application in the back ground, the shell process starts the java application, continues to run, and exit, so the shell process disappears, and the java application runs in the background with its own PID. 

This has an implication for Docker: if run processes in the background, when the PID 1 exits, the Docker container dies too. To keep PID 1 process running, a trick I often use is to tail –f some file, e.g.
ENTRYPOINT (java -cp HelloWorld.jar HelloWorld &) && touch tmp.txt && tail -f tmp.txt

So could “exec” be the answer? No, things are not always that simple. What if I want to run other applications after the HelloWorld application?  If I run two HelloWorld applications (hello2.sh):

#!/bin/bash
CLASSPATH="HelloWorld.jar"
exec java -cp "$CLASSPATH" HelloWorld -msg apple
exec java -cp "$CLASSPATH" HelloWorld -msg strawberry
Only the first application will get to run, and because it takes over the life of the shell process, the second one will never get to run:

The solution is:
  •  Keep the shell process running, so it can trap the killing signals
  •  Start application processes in the background, and get their PIDs
  •  When the shell process receives killing signals, kill application processes using their PIDs
Here is hello3.sh:
#!/bin/bash
trap 'shut_down' TERM INT

shut_down(){  
       echo "try killing $PID1"
    kill -TERM $PID1

    echo "try killing $PID2"
    kill -TERM $PID2     
      
       wait $PID1
       wait $PID2
      
       echo "kill accomplished"   
}

CLASSPATH="HelloWorld.jar"

java -cp "$CLASSPATH" HelloWorld -msg apple &
PID1=$!
echo "previous process id is $PID1"

java -cp "$CLASSPATH" HelloWorld -msg strawberry &
PID2=$!
echo "previous process id is $PID2"

wait $PID1
wait $PID2

echo "termination complete"
And how it runs:


Let us get a little bit crazier, and write hello4.sh to call hello3.sh:
#!/bin/bash
trap 'shut_down;wait $PID' TERM INT

shut_down(){  
       echo "try killing $PID"
    kill -TERM $PID   
       wait $PID
       echo "kill hello3.sh accomplished"
}

./hello3.sh &

PID=$!
echo "hello3.sh id is $PID"
wait $PID

echo "terminating hello3 complete"
 And how it runs:


Things in real life are even crazier, in my case, the PID 1 process in the container is a python application, which in turns invokes a shell, which in turns invokes two other shells. So the game of trapping and killing has to be passed from one to another. The basic principles stay the same.

To keep the python application running and passing killing signals:
pid= subprocess.Popen(['./hello4.sh], shell=True)

def signal_handler(signal, frame):
   
print('Exiting...')
    os.system(
"kill -TERM "+pid)
signal.signal(signal.SIGTERM, signal_handler)

while 1:
    time.sleep(
30*24*60*60)





No comments:

Post a Comment