This blog post explains the process to configure the Docker service health checks using native docker service commands.
Docker swarm is a container orchestration tool similar Kubernetes, OpenShift, ECS, EKS and it comes as part of Docker engine. Read more about swarm in official docker docs
Create swarm cluster
Swarm cluster contains at least one master node and optional worker nodes. This blog post primarily focuses on configuring health checks instead of swarm cluster creation. You can read more cluster creation here
Create new swarm cluster and initialize it
$ docker swarm init --advertise-addr 192.168.99.100
Swarm initialized: current node (dxn1zf6l61qsb1josjja83ngz) is now a manager.
To add a worker to this swarm, run the following command:
docker swarm join \
--token SWMTKN-1-49nj1cmql0jkz5s954yi3oex3nedyz0fb0xx14ie39trti4wxv-8vxv8rssmk743ojnwacrr2e7c \
192.168.99.100:2377
To add a manager to this swarm, run 'docker swarm join-token manager' and follow the instructions.
List cluster nodes using docker node ls command
$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
dxn1zf6l61qsb1josjja83ngz * manager1 Ready Active Leader
Create Service
- Docker image used in this article built from Github project DockerHealthCheckDemo,
- SSH into master node and execute the following command to create new service
employee_service
(this is just a basic command, more details will be added later) with 2 replicas(containers)
$ docker service create --replicas 2 -p 8080:8080 --name employee_service employee_springboot:latest
The service uses spring boot docker image employee_springboot:latest and exposes port 8080
You can always list available services using docker service ls command
$ docker service ls
ID NAME SCALE IMAGE
9uk4639qpg7n employee_service 2/2 employee
Continuous deployment
Update the service by bringing up new container from update image and maintain delay of 120 seconds between each container update. Continuous deployment can be achieved using the following command.
Let’s look at each parameter
$ docker service update --force --detach=false --update-parallelism=1 --update-delay=300s --update-failure-action=rollback --update-order=start-first employee_service
detach=false → Do not exit immediately and wait for the service to converge
update-parallelism=1 → Update one container at a time
update-delay=300s → Wait time between each container to come up(optional)
update-failure-action=rollback → Roll back to previous state if failed to update
update-order=start-first → Start new container first before killing existing container
However we have a problem here. By default Docker service brings up new container upon availability and marks it as healthy irrespective of the application status. So, HTTP requests from the client will be forwarded to new container before application came up, which then returns an error
Container Health check
To prevent this error, we need to add custom health check for the container. There are couple of way to do it
- Docker provides HEALTHCHECK instruction(command) to achieve this(preferred way)
- Or, Add health-cmd flag to docker service update command
Both do same thing but it’s just defining the place of the configuration.
HEALTHCHECK Instruction
First, let’s look at example HEALTHCHECK instruction below
## Use OpenJDK 11 slim image
FROM adoptopenjdk:11-jre-openj9-bionic
### Copy JAR file from local machine to container
COPY target/*.jar app.jar
### Expose the port
EXPOSE 8080
### Health check endpoint
HEALTHCHECK --start-period=2m --interval=30s --timeout=5s CMD curl -f http://localhost:8080/api/v1/health/find/status | grep UP || exit 1
### Start the Spring Boot application
CMD ["java","-jar","app.jar"]
As shown in the Dockerfile, container configured to take 2 minutes start up time, checks every 30 seconds for the status with 5 seconds timeout at each try.
HEALTHCHECK instruction accepts 4 parameters
--interval=DURATION
(default:30s
)
--timeout=DURATION
(default:30s
)
--start-period=DURATION
(default:0s
)
--retries=N
(default:3
)
The health check will first run interval seconds after the container is started, and then again interval seconds after each previous check completes.If a single run of the check takes longer than timeout seconds then the check is considered to have failed. start period provides initialization time for containers that need time to bootstrap. However, if a health check succeeds during the start period, the container is considered started and all consecutive failures will be counted towards the maximum number of retries.
health-cmd flag
Second way of doing this is using the health-cmd command with docker service update
$ docker service update --force --detach=false --update-parallelism=1 --update-delay=300s --update-failure-action=rollback --update-order=start-first --health-cmd="curl -f http://localhost:8080/api/v1/health/find/status | grep UP || exit 1" --health-start-period=2m --health-interval=5s --health-timeout=30s employee_service
we pass parameters with same values as in HEALTHCHECK instruction in Dockerfile
- -health-cmd=”
curl -f http://localhost:8080/api/v1/health/find/status | grep UP || exit 1
“ - –health-start-period=2m
- –health-interval=5s
- –health-timeout=30s
Testing
- Clone the repo and build the image
$ git clone https://github.com/pavankjadda/DockerHealthCheckDemo.git$ cd DockerHealthCheckDemo$ docker build -t employee_springboot .
2. Create new service
$ docker service create --replicas 2 -p 8080:8080 --name employee_service employee_springboot:latest
Once the service is up, docker ps should show two containers 1a995cde59cc, e7b913e2c3f4 (shown above)and status should be healthy
3. Update the service
$ docker service update --force --detach=false --update-parallelism=1 --update-delay=120s --update-failure-action=rollback --update-order=start-first --update-failure-action=rollback employee_service
docker service update command brings up another container 665ee54f8be while serving the requests to existing containers. The status of the container shows as health: starting. You can read more about container health status here
After a minute, container 665ee54f8be comes up and container e7b913e2c3f4 taken down. After 2 minutes, both containers are taken down and new containers are serving the user requests
Note: We can remove update-delay flag from docker service update command unless we specifically want delay between containers like AWS ECS blue green deployment.
Conclusion
Using docker service update command and Dockerfile HEALTHCHECK instruction, we can continuously deploy applications without interrupting the user workflow.