Bootstrapping an auto scaling web application within AWS via Kubernetes

Let’s create a state-of-the-art deployment pipeline for cloud native applications. In this guide, I’ll be using Kubernetes on AWS to bootstrap a load-balanced, static-files only web application. This is serious overkill for such an application, however this will showcase several necessities when designing such a system for more sophisticated applications. This guide assumes you are using OSX. You also need to be familiar with both homebrew and AWS.

At the end of this guide, we will have a Kubernetes cluster on which we will automatically deploy our application with each check in. This application will be load balanced (running in 2 containers) and health-checked. Aditionally, different branches will get different endpoints and not affect each other.

gif demonstrating automatic scaling of the cluster

About the tools

Kubernetes
A Google-developed container cluster scheduler

Terraform
A Hashicorp-developed infrastructure-as-code tool

Wercker
An online CI service, specifically for containers

Getting to know Terraform

To bootstrap Kubernetes, I will be using Kops. Kops internally uses Terraform to bootstrap a Kubernetes cluster. First, I’ve made sure Terraform is up to date

brew update
brew install terraform
Already up-to-date.

To make sure my AWS credentials (saved in $HOME/.aws/credentials) were picked up by Terraform, I’ve created an initial, bare-bones Terraform config (which is pretty much taken verbatim from the Terraform Getting Started Guide)

provider "aws" {}

resource "aws_instance" "example" {
  ami           = "ami-0d729a60"
  instance_type = "t2.micro"
}

planned

terraform plan 1-initial

and applied it

terraform apply 1-initial

That looks promising, and with a quick glance at the AWS console I could confirm that Terraform had indeed boostrapped a t2.micro instance in the us-east-1. I destroyed it quickly afterwards to incur little to no costs via

terraform destroy -force 1-initial

Alright, Terraform looks good, let’s get to work

Now that I have a basic understanding of Terraform, let’s get to using it. As initially said, we are going to use Kops to bootstrap our cluster, so let’s get it installed via the instructions found at the project’s GitHub repo.

export GOPATH=$HOME/golang/
mkdir -p $GOPATH
go get -d k8s.io/kops

This timed out for me, several times. Running go get with -u allowed me to rerun the same query again and again. This happened during the time my ISP was having some troubles, so your mileage will vary.

Afterwards, I built the binary

make

Also, I made sure to already have a hosted zone setup via the AWS console (mine was already setup since I’ve used Route53 as my domain registrar).

After the compilation was done, I’ve instructed Kops to output Terraform files for the cluster via

~/golang/bin/kops create cluster --zones=us-east-1a dev.k8s.orovecchia.com --state=s3://oro-kops-state
~/golang/bin/kops update cluster --target=terraform dev.k8s.orovecchia.com --state=s3://oro-kops-state

This will create the terraform files in out/terraform, setup the Kubernetes config in ~/.kube/config and store the state of Kops inside an S3 bucket. This has the benefit that a) other team members (potentially) can modify the cluster and b) the infrastructure itself can be safely stored within a repository

Let’s spawn the cluster

terraform plan
terraform apply

And that is pretty much everything there is to it, I was now able to connect to Kubernetes via kubectl.

brew install kubectl
kubectl cluster-info

Now onto creating the application:

Creating our application

For our demo application, we are going to use a simple (static) web page. Let’s bundle this into a Docker container. First, our site itself:

 <!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Hello there</title>
  </head>
  <body>
 Automation for the People 
  </body>
</html>

Not very sophisticated, but it gets the job done. Let’s use golang as our http server (again, this is just for demonstration purposes; If you are really thinking about doing something THAT complicated just to serve a static web page, have a look at this blog post instead. Still complex, but far less convoluted.)

package main

import (
  "log"
  "net/http"
)

func main() {
  fs := http.FileServer(http.Dir("static"))
  http.Handle("/", fs)
  log.Println("Listening on 8080...")
  http.ListenAndServe(":8080", nil)
}

And our build instructions, courtesy of Wercker

box: golang
dev:
  steps:
    - setup-go-workspace:
        package-dir: ./

    - internal/watch:
        code: |
          go build -o app ./...
          ./app
        reload: true

build:
  steps:
    - setup-go-workspace:
        package-dir: ./

    - golint

    - script:
        name: go build
        code: |
          CGO_ENABLED=0 go build -a -ldflags '-s' -installsuffix cgo -o app ./...

    - script:
        name: go test
        code: |
          go test ./...

    - script:
        name: copy to output dir
        code: |
          cp -r source/static source/kube.yml app $WERCKER_OUTPUT_DIR
wercker dev --publish 8080

This wercker file + command will automatically reload our local dev environment when we change things, so it will come in quite handy once we start developing new features. I can now access the page running on localhost:8080

GET http://localhost:8080
HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Date: Tue, 25 Oct 2016 14:13:18 GMT
Expires: Thu, 01 Jan 1970 00:00:00 GMT
Server: Jetty(9.2.11.v20150529)
Set-Cookie: PL=rancher;Path=/
Vary: Accept-Encoding, User-Agent
X-Api-Account-Id: 1a1
X-Api-Client-Ip: 10.0.2.2
X-Api-Schemas: http://localhost:8080/v1/schemas
Content-Length: 333

{"type":"collection","resourceType":"apiVersion","links":{"self":"http://localhost:8080/","latest":"http://localhost:8080/v1"},"createTypes":{},"actions":{},"data":[{"id":"v1","type":"apiVersion","links":{"self":"http://localhost:8080/v1"},"actions":{}}],"sortLinks":{},"pagination":null,"sort":null,"filters":{},"createDefaults":{}}
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 155
Content-Type: text/html; charset=utf-8
Last-Modified: Thu, 29 Sep 2016 19:23:33 GMT
Date: Thu, 29 Sep 2016 19:23:40 GMT

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Hello there</title>
  </head>
  <body>
 Automation for the People 
  </body>
</html>

Also, a wercker build will trigger a complete build step, including linting and testing (which we do not have yet).

Now, building locally is nice, however we’d like to create a complete pipeline, so that our CI server can also do the builds. Thankfully, with our wercker.yml file we already did that. All that is now needed is to add our repository into our wercker account and it should automatically trigger after a git push.

Let’s have a look via the REST API (the most important part, the result that passed)

GET https://app.wercker.com/api/v3/runs/57ed6b9318c4c70100453a9e

Building our deployment pipeline

Now that we’ve build our application, we still need a place to store the artifacts. For this, we are going to use the Docker Registry by Docker. I’ve added the deploy step to the wercker.yml and the two environment variables, USERNAME and PASSWORD via the Wercker GUI.

deploy-dockerhub:
  steps:
    - internal/docker-scratch-push:
        username: $USERNAME
        password: $PASSWORD
        tag: latest, $WERCKER_GIT_COMMIT, $WERCKER_GIT_BRANCH
        cmd: ./app
        ports: 8080
        repository: oronu/nginx-simple-html
        registry: https://registry.hub.docker.com

However, at first I was using the internal/docker-push step, which resulted in a whopping 256MB container. After reading through minimal containers, I changed it to docker-scratch-push instead, which resulted in a 1MB image instead. Also, I forgot to actually include the static files at first, which I also remedied afterwards.

Now all that’s left is to publish this to our Kubernetes cluster.

Putting everything together

For the last step, we are going to add the deployment to our Kubernetes cluster into the wercker.yml. This again needs several environment variables which will be set at the Wercker GUI.

kube-deploy:
  steps:
    - script:
      name: generate kube file
      code: |
        eval "cat <<EOF
        $(cat "$WERCKER_SOURCE_DIR/kube.yml")
        EOF" > kube-gen.yml
        cat kube-gen.yml
    - kubectl:
      server: $KUBERNETES_MASTER
      username: $KUBERNETES_USERNAME
      password: $KUBERNETES_PASSWORD
      insecure-skip-tls-verify: true
      command: apply -f kube-gen.yml

Additionally, I’ve added the kube.yml file which contains service and deployment definitions for Kubernetes.

---
kind: Service
apiVersion: v1
metadata:
  name: orohttp-${WERCKER_GIT_BRANCH}
spec:
  ports:
    - port: 80
      targetPort: http-server
      protocol: TCP
  type: LoadBalancer
  selector:
    name: orohttp-${WERCKER_GIT_BRANCH}
    branch: ${WERCKER_GIT_BRANCH}
    commit: ${WERCKER_GIT_COMMIT}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: orohttp-${WERCKER_GIT_BRANCH}
spec:
  replicas: 2
  template:
    metadata:
      labels:
        name: orohttp-${WERCKER_GIT_BRANCH}
        branch: ${WERCKER_GIT_BRANCH}
        commit: ${WERCKER_GIT_COMMIT}
    spec:
      containers:
      - name: orohttp-${WERCKER_GIT_BRANCH}
        image: oronu/nginx-simple-html:${WERCKER_GIT_COMMIT}
        ports:
        - name: http-server
          containerPort: 8080
          protocol: TCP

Now unfortunately Kubernetes does not support parameterization inside its template files yet. This could be remedied by building the template files via following script inside the wercker.yml

eval "cat <<EOF
$(cat "$1")
EOF"

This definition will result in all commits to all branches being automatically deployed. Different branches however will get different loadbalancers and therefore different DNS addresses.

And just to make sure, let’s check the actual deployed application:

kubectl get svc -o wide
NAME             CLUSTER-IP      EXTERNAL-IP                                                               PORT(S)   AGE       SELECTOR
kubernetes       100.64.0.1      <none>                                                                    443/TCP   55m       <none>
orohttp-master   100.71.47.208   af689c86086eb11e6a0a50e4d6ac19b8-1846451599.us-east-1.elb.amazonaws.com   80/TCP    8m        branch=master,commit=c9c84f1b9b479d2133541b2f3065af1d86559c94,name=orohttp-master
GET af689c86086eb11e6a0a50e4d6ac19b8-1846451599.us-east-1.elb.amazonaws.com
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 155
Content-Type: text/html; charset=utf-8
Last-Modified: Fri, 30 Sep 2016 08:57:28 GMT
Date: Fri, 30 Sep 2016 09:06:22 GMT

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Hello there</title>
  </head>
  <body>
 Automation for the People 
  </body>
</html>

Testing and health checks

Up until now, we are only hoping that our infrastructure and applications are working. Let’s make sure of that. However, instead of focusing on (classic) infrastructure tests, let’s first make sure that what actually matters is working: The application itself. For this, we can already test our pipeline. Let’s start working on our new feature:

git flow feature start init-healthcheck

Summary of actions:
- A new branch 'feature-init-healthcheck' was created, based on 'develop'
- You are now on branch 'feature-init-healthcheck'

Now, start committing on your feature. When done, use:

     git flow feature finish init-healthcheck

Now we are changing our application so that it responds to a /healthz endpoint: (this is taken with slight adaptations from here)

/*
Copyright 2014 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// A simple server that is alive for 10 seconds, then reports unhealthy for
// the rest of its (hopefully) short existence.
package main

import (
  "fmt"
  "log"
  "net/http"
  "time"
)

func main() {
  started := time.Now()
  http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    http.ServeFile(w, r, "static/index.html")
  })
  http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    duration := time.Now().Sub(started)
    if duration.Seconds() > 10 {
      w.WriteHeader(500)
      w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
    } else {
      w.WriteHeader(200)
      w.Write([]byte("ok"))
    }

  })
  log.Println(http.ListenAndServe(":8080", nil))
}

This application now serves (as before) our index.html from / and additionally exposes a healthz endpoint that responds with 200 OK for 10 seconds and 500 error after that. Basically, we’ve introduced a bug in our endpoint which does not even surface to a user. Remember that time when your backend silently swallowed every 100th request? Good times…

Now we also need to consume the healthz endpoint, which is done in our deployment spec.

---
kind: Service
apiVersion: v1
metadata:
  name: orohttp-${WERCKER_GIT_BRANCH}
spec:
  ports:
    - port: 80
      targetPort: http-server
      protocol: TCP
  type: LoadBalancer
  selector:
    name: orohttp-${WERCKER_GIT_BRANCH}
    branch: ${WERCKER_GIT_BRANCH}
    commit: ${WERCKER_GIT_COMMIT}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: orohttp-${WERCKER_GIT_BRANCH}
spec:
  replicas: 2
  template:
    metadata:
      labels:
        name: orohttp-${WERCKER_GIT_BRANCH}
        branch: ${WERCKER_GIT_BRANCH}
        commit: ${WERCKER_GIT_COMMIT}
    spec:
      containers:
      - name: orohttp-${WERCKER_GIT_BRANCH}
        image: oronu/nginx-simple-html:${WERCKER_GIT_COMMIT}
        ports:
        - name: http-server
          containerPort: 8080
          protocol: TCP
        livenessProbe:
          httpGet:
            path: /healthz
            port: http-server
          initialDelaySeconds: 15
          timeoutSeconds: 1

With those changes, we can push our new branch into github and check the (new!) endpoint that Kubernetes created.

kubectl get svc -o wide
NAME                               CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)   AGE       SELECTOR
kubernetes                         100.64.0.1       <none>                                                                    443/TCP   2h        <none>
orohttp-feature-init-healthcheck   100.65.243.228   ab4871ba286f611e6a0a50e4d6ac19b8-294871847.us-east-1.elb.amazonaws.com    80/TCP    42s       branch=feature-init-healthcheck,commit=6b223dfc4c846e3cff52025356c2cd70c545cb27,name=orohttp-feature-init-healthcheck
orohttp-master                     100.71.47.208    af689c86086eb11e6a0a50e4d6ac19b8-1846451599.us-east-1.elb.amazonaws.com   80/TCP    1h        branch=master,commit=c9c84f1b9b479d2133541b2f3065af1d86559c94,name=orohttp-master
GET ab4871ba286f611e6a0a50e4d6ac19b8-294871847.us-east-1.elb.amazonaws.com
HTTP/1.1 200 OK
Accept-Ranges: bytes
Content-Length: 155
Content-Type: text/html; charset=utf-8
Last-Modified: Fri, 30 Sep 2016 10:14:18 GMT
Date: Fri, 30 Sep 2016 10:17:43 GMT

<!DOCTYPE html>
<html>
  <head>
    <meta charset="UTF-8">
    <title>Hello there</title>
  </head>
  <body>
 Automation for the People 
  </body>
</html>

For a user everything looks fine, however when we check the actual pod definitions we can see that they die after a short time

kubectl get pods
NAME                                                READY     STATUS             RESTARTS   AGE
orohttp-feature-init-healthcheck-1833998652-5k6vo   0/1       CrashLoopBackOff   5          3m
orohttp-feature-init-healthcheck-1833998652-n0ggi   0/1       CrashLoopBackOff   5          3m
orohttp-master-3020287202-dhii1                     1/1       Running            0          1h
orohttp-master-3020287202-icqgp                     1/1       Running            0          1h

Let’s fix that:


/*
Copyright 2014 The Kubernetes Authors All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
    http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

// A simple server that is alive for 10 seconds, then reports unhealthy for
// the rest of its (hopefully) short existence.
package main

import (
  "log"
  "net/http"
)

func main() {
  http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
    http.ServeFile(w, r, "static/index.html")
  })
  http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(200)
    w.Write([]byte("ok"))
  })
  log.Println(http.ListenAndServe(":8080", nil))
}
The Deployment "orohttp-feature-init-healthcheck" is invalid.
spec.template.metadata.labels: Invalid value: {"branch":"feature-init-healthcheck","commit":"latest","name":"orohttp-feature-init-healthcheck"}: `selector` does not match template `labels`

Uh-oh, this is not related to our build file but to our infrastructure. This seems to be caused by https://github.com/kubernetes/kubernetes/issues/26202 and seems to suggest that changing selectors (what we are using for the load balancer to know which containers to switch in) is not a good idea but instead creating new load balancers. For our use case, let’s simply remove the commit label since it is not needed anyways (the commit is already referenced as the image itself)

After that is fixed, let’s recheck our deployment

kubectl get pods
NAME                                               READY     STATUS    RESTARTS   AGE
orohttp-feature-init-healthcheck-568167226-mm7uf   1/1       Running   0          1m
orohttp-feature-init-healthcheck-568167226-xvokv   1/1       Running   0          1m
orohttp-master-3020287202-dhii1                    1/1       Running   0          1h
orohttp-master-3020287202-icqgp                    1/1       Running   0          1h

Much better. Let’s finish our work with a merge to master and recheck our deployment one last time.

git flow feature finish init-healthcheck
git push
Merge made by the 'recursive' strategy.
 app.go   | 31 +++++++++++++++++++++++++------
 kube.yml |  8 ++++++--
 2 files changed, 31 insertions(+), 8 deletions(-)
Deleted branch feature-init-healthcheck (was 1e24202).

Summary of actions:
- The feature branch 'feature-init-healthcheck' was merged into 'develop'
- Feature branch 'feature-init-healthcheck' has been removed
- You are now on branch 'develop'
kubectl get deployments,pods
NAME                                               DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
orohttp-develop                                    2         2         2            2           47s
orohttp-feature-init-healthcheck                   2         2         2            2           7m
NAME                                               READY     STATUS    RESTARTS     AGE
orohttp-develop-3627383002-joyey                   1/1       Running   0            47s
orohttp-develop-3627383002-nk3me                   1/1       Running   0            47s
orohttp-feature-init-healthcheck-568167226-mm7uf   1/1       Running   0            7m
orohttp-feature-init-healthcheck-568167226-xvokv   1/1       Running   0            7m

Cleanup

terraform plan -destroy 
terraform destroy -force
Error applying plan:

2 error(s) occurred:

 aws_ebs_volume.us-east-1a-etcd-events-dev-k8s-orovecchia-com: Error deleting EC2 volume vol-3d28229a: VolumeInUse: Volume vol-3d28229a is currently attached to i-1a27720c
    status code: 400, request id: a1df6173-5f72-4c43-90d4-8a723f32dcd4
 aws_ebs_volume.us-east-1a-etcd-main-dev-k8s-orovecchia-com: Error deleting EC2 volume vol-192822be: VolumeInUse: Volume vol-192822be is currently attached to i-1a27720c
    status code: 400, request id: 1ce03a4f-1b81-4868-9586-57047ffb1afa

Terraform does not automatically rollback in the face of errors.
Instead, your Terraform state file has been partially updated with
any resources that successfully completed. Please address the error
above and apply again to incrementally change your infrastructure.

Oh well, looks like Terraform (or rather, AWS) did not update its state soon enough. No issue though, you can simply rerun the command.

terraform destroy -force

Voila. However, Kubernetes reccomends to also use Kops to delete the cluster to make sure that any potential ELBs or volumes resulted during the usage of Kubernetes are cleaned up as well.

~/golang/bin/kops delete cluster --yes dev.k8s.orovecchia.com --state=s3://oro-kops-state 

ToDos

Now granted this is not a comprehensive guide.

  • It is still missing any sort of notification in case something goes wrong
  • There is no automatic cleanup of deployments
  • There is no automatic rollback in case of errors
  • And, above all: This is extremely complicated just to host a simple web page. Again, for only static files, you are much better of using something like GitHub pages or even S3.

Closing remarks

Would I reccomend using Kubernetes? ABSOLUTELY.

Not only is Kubernetes extremely sophisticated, it is also advancing at an incredible speed. For reference, I’ve tried it out around a year ago with V0.18, and it did not yet have Deployments, Pets, Batch Jobs or ConfigMaps, all of which are incredibly helpful.

Having said that, I am not sure if I’d necessarily reccomend Wercker. Granted, it works nicely - when it works. I’ve ran into several panics when trying to run the wercker cli locally, NO output whatsoever on the web GUI if the working directory does not exist, and the documentation is severely outdated. It is still in beta, yes, however if this is an indication of things to come that I am not sure if I would like to bet on it for something as critical as a CI server.

TL;DR

To bootstrap a kubernetes cluster:

kops create cluster --zones=us-east-1a dev.k8s.orovecchia.com --state=s3://oro-kops-state --yes

To push a new version of our code or infrastructure:

wercker deploy --pipeline kube-deploy