Abandon the Helm, leveraging CDK for Kubernetes

I’ve had enough of Helm. I don’t know who thought string-based templating engines would be a good idea, but I have had one too many indention relate bugs. They’re a source of bug and a pain. Kubernetes YAML files just contain a ton of boiler-plate YAML configuration. Like how many times do I have to specify the labels? Its spec/template/spec for Deployment, but spec/jobTemplate/spec for CronJob. Ain’t nobody got time to remember that.

Enter cdk8s. It’s built-upon CDK, a software development kit that uses standard programming languages, like TypeScript, Python, or Java, as a way to define resources that then get compiled into YAML or JSON to upload to CloudFormation, or in our case, Kubernetes.

Why would you want/need a full programming language just to define some infrastructure? Well, there are some benefits. Let’s go through them.

The Good

No more string based templating

In Helm, when you’re templating files, you use Golang’s text templating system. You start writing YAML text, then depending on your use case, mix in some variables, some conditionals, some loops, and more. At first, it seems reasonable and maybe you’ve only got a few variables to swap out (snippet source):

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
  labels:
    app: {{ template "sentry.name" . }}
    chart: {{ .Chart.Name }}-{{ .Chart.Version | replace "+" "_" }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
# ...
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        checksum/config2: {{ include (print $.Template.BasePath "/sentry-config-file.yaml") . | sha256sum }}
        checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
      labels:
        component: web
        chart: {{ .Chart.Name }}
    spec:
      containers:
      - args:
        - run
        - web
        {{- if .Values.ingress.tls_secret_name }}
        env:
        - name: SENTRY_USE_SSL
          value: "1"
        {{- end }}

Maybe it stays that way, or maybe you need more and more variables, substitutions. Especially if you’re vending a Helm chart to others. Then you end up with everything needed to be passed as values like this template from ingress-nginx.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: apps/v1
kind: DaemonSet
metadata:
  labels:
    {{- include "ingress-nginx.labels" . | nindent 4 }}
    app.kubernetes.io/component: controller
    {{- with .Values.controller.labels }}
    {{- toYaml . | nindent 4 }}
    {{- end }}
  name: {{ include "ingress-nginx.controller.fullname" . }}
  namespace: {{ include "ingress-nginx.namespace" . }}
  {{- if .Values.controller.annotations }}
  annotations: {{ toYaml .Values.controller.annotations | nindent 4 }}
  {{- end }}
spec:
  selector:
    matchLabels:
      {{- include "ingress-nginx.selectorLabels" . | nindent 6 }}
      app.kubernetes.io/component: controller
  revisionHistoryLimit: {{ .Values.revisionHistoryLimit }}
  {{- if .Values.controller.updateStrategy }}
  updateStrategy: {{ toYaml .Values.controller.updateStrategy | nindent 4 }}
  {{- end }}
  minReadySeconds: {{ .Values.controller.minReadySeconds }}
  template:
    metadata:
    {{- if .Values.controller.podAnnotations }}
      annotations:
      {{- range $key, $value := .Values.controller.podAnnotations }}
        {{ $key }}: {{ $value | quote }}
      {{- end }}
    {{- end }}
# ...

By that point, you’ve lost all meaning of templating. The YAML has become but a shell of its original self, it’s merely just a vessel that the values.yaml get passed through onto their final destination that is your Kubernetes cluster. Then you ask yourself, is this the best it can be? Is any of this logic correct? Can you even tell at a glance without unit tests? Wait, a templating system has unit tests? Indeed.

Now, you get to define and expose class based properties and fields. For example, instead of explicitly listing every single property that can be overridden, you can do:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Allow user to override a few props
type OverridableProps = Pick<DaemonSetProps, 'image' | 'imagePullPolicy'>;

function makeDaemonSet(scope: Construct, props: OverrideableProps)
  new DaemonSet(app, 'nginx', {
    name: 'controller',
    image: 'registry.k8s.io/ingress-nginx/controller',
    // Allow the user to override things:
    ...props   
  });
}

No more indention hell

Using Helm charts means that you’re forced to be very careful about indention. It gets worse once you start mixing templates into the picture.

For example, I found this bug in ingress-nginx caused by improper indention:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
      initContainers:
        - command:
          - /bin/chown
          - "101"
          - /mycache
          image: busybox:latest
          name: chmod
          volumeMounts:
          - mountPath: /mycache
            name: cache
          # <--- Notice that this is aligned under the volumeMounts, not under initContainers
          # <-- Also note the empty image and names
          - name:
            image: 
            command: ['sh', '-c', '/usr/local/bin/init_module.sh']
            volumeMounts:
              - name: modules
                mountPath: /modules_mount

The bug was because the following snippet had one too many tabs at the beginning, but you’d never figure know that just looking at it:

1
2
3
4
5
      {{- if .Values.controller.extraModules }}
        {{- range .Values.controller.extraModules }}
          - name: {{ .Name }}
            image: {{ .Image }}
            command: ['sh', '-c', '/usr/local/bin/init_module.sh']

CDK fixes that. You no-longer care about indention, unless of course you use the Python bindings for CDK8s. You use normal objects and set properties on them, call methods, etc. CDK8s then is responsible for serializing that into a properly indented YAML file.

Reusable functions

With Helm templating, you frequently end up just having a lot of boiler-plate YAML tags around. For example, a bunch of my network policies ended up having the same egress policy in them. Before, I would copy and paste the following file into many different namespaces:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ...
  namespace: ...
spec:
  egress:
    # Allow access to DNS
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
      to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns

Helm does support a reusable template, but I was using Rancher Fleet which I don’t know if it supports them and they suffer from the indention problem mentioned above.

With code-based solutions, this entire model changes because I can just write a method that can be re-used by different constructs. This one of the most powerful features of code-based infra as code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
function grantDns(context: Construct, netPolicy: kplus.NetworkPolicy) {
    const peer = kplus.Pods.select(context, 'netpol-kube-dns', {
        namespaces: kplus.Namespaces.select(context, 'netpol-kube-system', {
            names: ['kube-system']
        }),
        labels: {
            'kube-app': 'kube-dns'
        }
    });

    netPolicy.addEgressRule(peer, [
        NetworkPolicyPort.tcp(53),
        NetworkPolicyPort.udp(53)
    ]);
}

const netPol = new kplus.NetworkPolicy(context, 'netpol');

grantDns(context, netPol);

Compile time type-safety

Does that property exist on that resource? Are you missing anything critical? Any invalid field values? YAML provides no compile-time validation. I like to use yamllint to see if it’s syntactically valid YAML, but that doesn’t validate that fields exist.

CDK8s gets this right. Fields that don’t exist on the resource can’t be set in code and it simply does not compile.

The Bad

Can’t move resources

This is another problem with the underlying CDK design. In CDK, resource names are derived from the path of constructs and resource names. A resource might have the path App/Chart/MyService/Deployment, which has the resource name chart-myservice-deployment-c873a441.

If I try to rename the construct name to Deployment2, I get a new resource name Chart-MyService-Deployment2-c8cb06b1 and helm will delete the old one and create a new deployment.

This is dangerous. When you’re writing code, you easily forget about this because sometimes a code refactoring is needed to fix some issue, but you can’t always safely do this because Helm will end up deleting and recreating a resource. Helm also doesn’t have CloudFormation’s safe deployment mechanism where dependencies between resources are identified and creates are deployed first, then if everything succeeds, deletes are performed. ==Helm has a static defined order on how it deploys resources==. This sort of replicates CloudFormation’s deployment ordering strategy, but it doesn’t actually guarantee that resources are working correctly. For example, a PersistentVolumeClaim can be created, but fail to actually provision and the deployment will get stuck.

Deployment tooling

CDK8s itself provides a bunch of classes that allow you to generate YAML files, but those are not useful unless you have something that actually deploys those files. This tooling for deployments is very important.

It is the glue that takes the synthesized output files and actually gets them onto your K8s cluster. Base CDK does pretty well with this because it inherently only has to work with AWS and has a built-in CLI to deploy to CloudFormation or using CDK you can deploy with CodePipelines or GitHub Actions. To be honest, I only worked with the Amazon-internal variant of the pipeline which is different and works well.

What are we missing? The CDK CLI itself provides a command cdk deploy that synthesizes your output, identifies the dependencies between the stacks, then sequentially deploys the stacks in transitive order using CloudFormation.

I want the thing that manages the deployment part of CI/CD. Kubernetes has a lot of options here. Let’s explore some

What about Helm?

Helm provides a CLI that given a YAML manifest, it will create, update, and delete the resources in Kubernetes. I could use GitHub Actions to first synthesize my CDK8s application, then deploy it like below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
jobs:
  validate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - id: synth
        name: Synth cdk8s manifests
        run: |-
          npm install
          npm run build

    - name: Set up Kubectl
        uses: azure/setup-kubectl@v4.0.0
        with:
          version: 'v1.31.0'

      - name: Deploy to cluster
        uses: azure/setup-helm@v4.3.0

      - name: Create kubeconfig file
        run: echo "${{ secrets.KUBECONFIG }}" > ${{ github.workspace }}/kubeconfig

      - name: Deploy to Kubernetes using Helm
        if: github.ref == 'refs/heads/master'
        env:
          KUBECONFIG: ${{ github.workspace }}/kubeconfig
        run: |
          helm upgrade --install mytarget dist/ --namespace default

Functionally that works, however it lack some features:

Can’t deploy to multiple Helm releases. My Git repo has several different Helm charts and releases that got deployed. Some of them had to be deployed first. CDK8s doesn’t make this easy.

1
2
3
4
5
6
7
const app = new cdk8s.App();

const chart1 = new cdk8s.Chart(app, 'common');
const chart2 = new cdk8s.Chart(app, 'service1');
const chart3 = new cdk8s.Chart(app, 'service2');

app.synth();

Everything gets emitted to the dist folder. Because there’s only one Chart.yaml and everything is under the same templates, I can’t use Helm CLI to deploy one file to one release.

1
2
3
4
dist/Chart.yaml
dist/templates/common.k8s.yaml
dist/templates/service1.k8s.yaml
dist/templates/service2.k8s.yaml

If we look back to my GitHub Actions attempt:

1
2
3
4
5
6
- name: Deploy to Kubernetes using Helm
  if: github.ref == 'refs/heads/master'
  env:
    KUBECONFIG: ${{ github.workspace }}/kubeconfig
  run: |
    helm upgrade --install mytarget dist/ --namespace default

The only idea that I have to fix this is to emit separate chart files, then run a GitHub action that moves them all into separate Chart folders that Helm can deploy separately.

What about ArgoCD?

There’s also ArgoCD which is a common Kubernetes-native solution for managing deployments. I’ve avoided it thus far because it’s always seemed overly-complex with several different controllers all running for my use-case of just deploying some YAML to a cluster. Do I really want to need how many controllers just to deploy some YAML?

Also, it seems like when ArgoCD works with a Git repo, it doesn’t know how to first compile the manifests. At least, that’s what this blog post implies and it required two separate Git repositories.

Admittedly, I didn’t test out ArgoCD, so there might be something I’m missing, but I it still won’t fix some of the other issues when coding with YAML.

Working with legacy resources

If you have any existing resources written using raw YAML and are already created in an existing Helm release and you want to adopt cdk8s, you’re going to be in a tricky place. If you want to switch an existing Helm release from raw YAML to cdk8s, you have to either:

  1. Synth the output and put them in the same folder as your legacy YAML

Whether this works or not depends on what kind of CI/CD you currently use. If you’re using something like Rancher Fleet, like I was until I realized it was fragile and frequently broke, then you now have to have two different Git repos, one with cdk8s, and one with raw YAML that uses your CI/CD to commit the output into that repo. [This post](If you’re using something like Racher Fleet, like I was, then you now have to have two different Git repos, one with cdk8s) talks about that model using GitHub Actions, but that complexity terrified me.

  1. Import the legacy resources as-is and include them in the output

When I worked at AWS and owned a program to switch from an internal framework that used raw YAML CloudFormation to native CDK, we used CDK’s CfnInclude construct to do this. Cdk8s has an equivalent called Include

Messy auto generated resources

CDK8s has a number of dev-friendly methods, like the ability to create NetworkPolicies with permissions using just one line of code:

1
myService.connections.allowTo(redis.workload);

From a dev perspective, this is massive productivity boost. It can auto create both the ingress and egress policies. However, cdk8s generates multiple NetworkPolicies for each and every grant with automatically generated names.

From a debugging and operational perspective, this makes it difficult when you’re viewing your cluster resources and trying to figure out what policy applies and what each means.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allowegressc8116eb74577cb8dbda1910afbbd7c3456-c8832897
  namespace: foo
spec:
  egress:
    # ...
  podSelector:
    matchLabels:
      cdk8s.io/metadata.addr: example-TestService-Deployment-c8d415ef
  policyTypes:
    - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allowingresskube-systemc8116eb74577cb8dbda191-c803ee19
  namespace: kube-system
spec:
  ingress:
    # ...
  podSelector:
    matchLabels:
      kube-app: kube-dns
  policyTypes:
    - Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allowegressc8b9fe52de2665a49bb6866db05c076914-c8924060
  namespace: foo
spec:
  egress:
    # ...
  podSelector:
    matchLabels:
      cdk8s.io/metadata.addr: example-TestService-Deployment-c8d415ef
  policyTypes:
    - Egress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allowingressfooc8b9fe52de2665a49bb6866d-c8819c21
  namespace: foo
spec:
  ingress:
    # ...
  podSelector:
    matchLabels:
      cdk8s.io/metadata.addr: example-TestService-Redis-c8b9fe52
  policyTypes:
    - Ingress

Difference in mental model compared to YAML

I expect differences when comparing a programming language vs a YAML templates because they’re just fundamentally different styles of writing. YAML is the language for describing Kubernetes resources. Tutorials use them, this blog uses YAML for Kubernetes. With so much documentation using YAML, you now have to mentally translate that into the CDK8s equivalent. Generally they look pretty similar, like a container looks the same:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: homeassistant
spec:
  podSelector:
    matchLabels:
      app: homeassistant
  policyTypes:
    - Egress
  egress:
    # Allow DNS queries
    - ports:
      - port: 53
        protocol: TCP
      - port: 53
        protocol: UDP
      to:
      - namespaceSelector:
          matchLabels:
            kubernetes.io/metadata.name: kube-system
        podSelector:
          matchLabels:
            k8s-app: kube-dns

And the equivalent in CDK8s:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
const netPolicy = kplus.NetworkPolicy(context, 'netpolicy');
const peer = kplus.Pods.select(context, 'netpol-kube-dns', {
  namespaces: kplus.Namespaces.select(context, 'netpol-kube-system', {
    names: ['kube-system']
  }),
  labels: {
    'kube-app': 'kube-dns'
  }
});

netPolicy.addEgressRule(peer, [
  NetworkPolicyPort.tcp(53),
  NetworkPolicyPort.udp(53)
]);

At first I didn’t even know about the Pods.select and Namespaces.select and assumed I had to create my own custom peer class, but it wasn’t until I started preparing to make a PR that I found this doc that explained this.

This is not unique to CDK8s, and is prevalent in CDK vs CloudFormation too or even Terraform/OpenTofu. It’s also not necessary a really bad problem, but the more field names start to differ. The more you have to reach for different classes or start to pass around CDK contexts and names, it just gets confusing.

Sure, I know the answer now, but these things will be confusing for the next person.

Unexpected defaults

CDK8s provides defaults for Deployments that I don’t think should be provided. For example, they provide a default resource request and a default security context.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
resources:
  limits:
    cpu: 1500m
    memory: 2048Mi
  requests:
    cpu: 1000m
    memory: 512Mi
securityContext:
  allowPrivilegeEscalation: false
  privileged: false
  readOnlyRootFilesystem: true
  runAsNonRoot: true

CPU resource limits are bad. You probably shouldn’t use them unless you know your process has a max thread count. On-top of that, CDK8s limits to 1.5 CPU cores which is not rounded and if you do have two threads running a lot, one of them is going to get throttled 50% of every second. I imagine they’re just picking some default value to help Kubernetes bin-pack, but in this case, I say you’re more likely just picking a bad number.

While enforcing read-only root file-systems is good security practice™, it’s also likely to cause a lot of broken software. So many software components I run require the ability to write temp files, etc. I’m not sure why they chose this default. Maybe they wanted to do opt-out security, which I mean if the software can run, great, but also it’s very inconsistent.

My opinion: CDK8s should use the same defaults as Kubernetes itself. If they want to provide secure, robust solutions, provide a higher level construct. I’ve seen this employed inside of Amazon for security sensitive CDK.

Conclusion

CDK8s provides some pretty useful dev productivity improvements over writing raw YAML defined resources. The built-in compile-time type-safety, code completion when in an IDE, and ability to abstract out repetitive code is a time saver.

However, it does have some down-sides. The big one is lack of the last bit of deployment tooling that actually helps this get deployed to my Kubernetes cluster.

Copyright - All Rights Reserved

Comments

To give feedback, send an email to adam [at] this website url.

Donate

If you've found these posts helpful and would like to support this work directly, your contribution would be appreciated and enable me to dedicate more time to creating future posts. Thank you for joining me!

Donate to my blog