Nguyệt Ánh | Blue-green Deployment

Context

Once you start going deeper down development, you will eventually cross paths with a tool called a version control system, or VCS for short. In the old days, “Subversion” was the go-to tool for this. But after the greatest man of all time, Linus Torvals, created the tool “Git”, Subversion became the thing of the past.

Git shines best when used in a team setting, don’t you think. Everyone can work in parallel in each feature branches, and later on have each branches merge up to a full feature. To optimize this, we can also integrate CI/CD into our workflow.

CI (or Continuous Integration), which tries to do things on events, such as push, or pull requests, for checking our codes, testing our codes, and helping the review process. After CI, CD (or Continuous Delivery) is also there. After we test and build the product, where should we deploy it to? Let’s say, you’re building a React project, after CI tested and built the project into some HTML files, and then CD process would take those files, give it to an NGINX server. All of this happens, just from a simple git push, fully automatically.

There’s a little small thing in CD though, did you notice? That is, to deploy a new version of the project, there’s a short time where the production environment has to go down for it. For a large enterprise, that short downtime could be a loss of revenue.

What is blue-green?

Blue-green deployment strategy is, in a nutshell, we keep two environments, which can be in different hosts or containers, one of which is the production environment (called blue), but we spin up a new environment called staging for that new version (called green), then we run checks and smoke tests to make sure green is stable. If it is stable, we can ‘slowly’ redirect the traffic to green instead.

The colors are merely a concept. It was difficult to reason about the colors at first for me. But after green is fully migrated over, green would become the ‘blue’ then.

Implementation

In the implementation below, we want to deploy a Java Spring backend, as well as using a simple Bash script to do the switching instead of relying on industry software for it.

Step 1. Preparations

The team is using Docker, so the preparation should be dead-simple as we only need to setup the Docker image, push it to a Container Registry like GHCR or Docker Hub. You can just spin up droplets or containers that run the programs on the host also, but that’s not in-scope for this post.

- name: Build and push Docker image (latest + commit-sha)
  id: build-push
  uses: docker/build-push-action@v5
  with:
    context: .
    push: true
    tags: |
      ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
      ${{
        github.event_name == 'push' && github.ref == 'refs/heads/main'
        && format('{0}/{1}:latest', env.REGISTRY, env.IMAGE_NAME)
        || ''
      }}

We used GitHub Actions Runners to carry out the CI/CD workflow. For this step, we are using Docker’s provided action to build and push the Docker image to GHCR.

Step 2. Pull

echo "info: pulling $IMAGE:$TAG"
docker login ghcr.io -u "$GITHUB_ACTOR" -p "$GITHUB_TOKEN"
docker pull "$IMAGE:$TAG"
docker pull "$IMAGE"

GHCR and Docker Hub should work in the same way. But to pull from private repositories, you need to login with the credentials. It’s probably better to setup a dedicated DevOps account and store the credentials in one of Docker’s approved secret stores, but simply passing in ACTOR and TOKEN is enough to pull from a private GHCR.

Step 3. The switcheraoo

We should have the latest image version already pulled here. Next thing to do, we try to migrate over to this version without causing downtime to the production environment. We use NGINX as a reverse proxy, but at the time of writing this, we haven’t setup mounted volumes for it yet, so we will try to edit the configuration files directly inside the container for this example.

We just use a mix of awk and sed to change around the values for the proxy.

if exec_rev_proxy "grep -sq \"backend-blue\" $NGINX_CONF_PATH"; then
  echo "server is blue"

  echo "starting green staging"
  check_and_start backend-green
  healthcheck $GREEN_URL
  swap $GREEN_URL
else
  echo "server is green"

  echo "starting blue staging"
  check_and_start backend-blue
  healthcheck $BLUE_URL
  swap $BLUE_URL
fi

Notice: the check_and_start function is not a built-in command. It’s a function I wrote to simply check if the container name passed in is running or not, if it is running, delete it and recreate it with the newest image.

Afterwords

Looking at the above commands, even though very simplified, in a whole script, would it be considered difficult to understand? For a newcomer to Bash scripting, would it be a stretch for them to understand this? Another intern, when reviewing this script of mine, did raise the issue that “The script is too complicated”.