I’ve had enough of Helm. I don’t know who thought string-based templating engines would be a good idea, but I have had one too many indention relate bugs. They’re a source of bug and a pain. Kubernetes YAML files just contain a ton of boiler-plate YAML configuration. Like how many times do I have to specify the labels? Its spec/template/spec for Deployment, but spec/jobTemplate/spec for CronJob. Ain’t nobody got time to remember that.
Enter cdk8s. It’s built-upon CDK, a software development kit that uses standard programming languages, like TypeScript, Python, or Java, as a way to define resources that then get compiled into YAML or JSON to upload to CloudFormation, or in our case, Kubernetes.
Why would you want/need a full programming language just to define some infrastructure? Well, there are some benefits. Let’s go through them.
The Good
No more string based templating
In Helm, when you’re templating files, you use Golang’s text templating system. You start writing YAML text, then depending on your use case, mix in some variables, some conditionals, some loops, and more. At first, it seems reasonable and maybe you’ve only got a few variables to swap out (snippet source):
|
|
Maybe it stays that way, or maybe you need more and more variables, substitutions. Especially if you’re vending a Helm chart to others. Then you end up with everything needed to be passed as values like this template from ingress-nginx.
|
|
By that point, you’ve lost all meaning of templating. The YAML has become but a shell of its original self, it’s merely just a vessel that the values.yaml
get passed through onto their final destination that is your Kubernetes cluster. Then you ask yourself, is this the best it can be? Is any of this logic correct? Can you even tell at a glance without unit tests? Wait, a templating system has unit tests? Indeed.
Now, you get to define and expose class based properties and fields. For example, instead of explicitly listing every single property that can be overridden, you can do:
|
|
No more indention hell
Using Helm charts means that you’re forced to be very careful about indention. It gets worse once you start mixing templates into the picture.
For example, I found this bug in ingress-nginx caused by improper indention:
|
|
The bug was because the following snippet had one too many tabs at the beginning, but you’d never figure know that just looking at it:
|
|
CDK fixes that. You no-longer care about indention, unless of course you use the Python bindings for CDK8s. You use normal objects and set properties on them, call methods, etc. CDK8s then is responsible for serializing that into a properly indented YAML file.
Reusable functions
With Helm templating, you frequently end up just having a lot of boiler-plate YAML tags around. For example, a bunch of my network policies ended up having the same egress policy in them. Before, I would copy and paste the following file into many different namespaces:
|
|
Helm does support a reusable template, but I was using Rancher Fleet which I don’t know if it supports them and they suffer from the indention problem mentioned above.
With code-based solutions, this entire model changes because I can just write a method that can be re-used by different constructs. This one of the most powerful features of code-based infra as code.
|
|
Compile time type-safety
Does that property exist on that resource? Are you missing anything critical? Any invalid field values? YAML provides no compile-time validation. I like to use yamllint to see if it’s syntactically valid YAML, but that doesn’t validate that fields exist.
CDK8s gets this right. Fields that don’t exist on the resource can’t be set in code and it simply does not compile.
The Bad
Can’t move resources
This is another problem with the underlying CDK design. In CDK, resource names are derived from the path of constructs and resource names. A resource might have the path App/Chart/MyService/Deployment
, which has the resource name chart-myservice-deployment-c873a441
.
If I try to rename the construct name to Deployment2
, I get a new resource name Chart-MyService-Deployment2-c8cb06b1
and helm will delete the old one and create a new deployment.
This is dangerous. When you’re writing code, you easily forget about this because sometimes a code refactoring is needed to fix some issue, but you can’t always safely do this because Helm will end up deleting and recreating a resource. Helm also doesn’t have CloudFormation’s safe deployment mechanism where dependencies between resources are identified and creates are deployed first, then if everything succeeds, deletes are performed. ==Helm has a static defined order on how it deploys resources==. This sort of replicates CloudFormation’s deployment ordering strategy, but it doesn’t actually guarantee that resources are working correctly. For example, a PersistentVolumeClaim can be created, but fail to actually provision and the deployment will get stuck.
Deployment tooling
CDK8s itself provides a bunch of classes that allow you to generate YAML files, but those are not useful unless you have something that actually deploys those files. This tooling for deployments is very important.
It is the glue that takes the synthesized output files and actually gets them onto your K8s cluster. Base CDK does pretty well with this because it inherently only has to work with AWS and has a built-in CLI to deploy to CloudFormation or using CDK you can deploy with CodePipelines or GitHub Actions. To be honest, I only worked with the Amazon-internal variant of the pipeline which is different and works well.
What are we missing? The CDK CLI itself provides a command cdk deploy
that synthesizes your output, identifies the dependencies between the stacks, then sequentially deploys the stacks in transitive order using CloudFormation.
I want the thing that manages the deployment part of CI/CD. Kubernetes has a lot of options here. Let’s explore some
What about Helm?
Helm provides a CLI that given a YAML manifest, it will create, update, and delete the resources in Kubernetes. I could use GitHub Actions to first synthesize my CDK8s application, then deploy it like below:
|
|
Functionally that works, however it lack some features:
Can’t deploy to multiple Helm releases. My Git repo has several different Helm charts and releases that got deployed. Some of them had to be deployed first. CDK8s doesn’t make this easy.
|
|
Everything gets emitted to the dist folder. Because there’s only one Chart.yaml and everything is under the same templates, I can’t use Helm CLI to deploy one file to one release.
|
|
If we look back to my GitHub Actions attempt:
|
|
The only idea that I have to fix this is to emit separate chart files, then run a GitHub action that moves them all into separate Chart folders that Helm can deploy separately.
What about ArgoCD?
There’s also ArgoCD which is a common Kubernetes-native solution for managing deployments. I’ve avoided it thus far because it’s always seemed overly-complex with several different controllers all running for my use-case of just deploying some YAML to a cluster. Do I really want to need how many controllers just to deploy some YAML?
Also, it seems like when ArgoCD works with a Git repo, it doesn’t know how to first compile the manifests. At least, that’s what this blog post implies and it required two separate Git repositories.
Admittedly, I didn’t test out ArgoCD, so there might be something I’m missing, but I it still won’t fix some of the other issues when coding with YAML.
Working with legacy resources
If you have any existing resources written using raw YAML and are already created in an existing Helm release and you want to adopt cdk8s, you’re going to be in a tricky place. If you want to switch an existing Helm release from raw YAML to cdk8s, you have to either:
- Synth the output and put them in the same folder as your legacy YAML
Whether this works or not depends on what kind of CI/CD you currently use. If you’re using something like Rancher Fleet, like I was until I realized it was fragile and frequently broke, then you now have to have two different Git repos, one with cdk8s, and one with raw YAML that uses your CI/CD to commit the output into that repo. [This post](If you’re using something like Racher Fleet, like I was, then you now have to have two different Git repos, one with cdk8s) talks about that model using GitHub Actions, but that complexity terrified me.
- Import the legacy resources as-is and include them in the output
When I worked at AWS and owned a program to switch from an internal framework that used raw YAML CloudFormation to native CDK, we used CDK’s CfnInclude construct to do this. Cdk8s has an equivalent called Include
Messy auto generated resources
CDK8s has a number of dev-friendly methods, like the ability to create NetworkPolicies with permissions using just one line of code:
|
|
From a dev perspective, this is massive productivity boost. It can auto create both the ingress and egress policies. However, cdk8s generates multiple NetworkPolicies for each and every grant with automatically generated names.
From a debugging and operational perspective, this makes it difficult when you’re viewing your cluster resources and trying to figure out what policy applies and what each means.
|
|
Difference in mental model compared to YAML
I expect differences when comparing a programming language vs a YAML templates because they’re just fundamentally different styles of writing. YAML is the language for describing Kubernetes resources. Tutorials use them, this blog uses YAML for Kubernetes. With so much documentation using YAML, you now have to mentally translate that into the CDK8s equivalent. Generally they look pretty similar, like a container looks the same:
|
|
And the equivalent in CDK8s:
|
|
At first I didn’t even know about the Pods.select
and Namespaces.select
and assumed I had to create my own custom peer class, but it wasn’t until I started preparing to make a PR that I found this doc that explained this.
This is not unique to CDK8s, and is prevalent in CDK vs CloudFormation too or even Terraform/OpenTofu. It’s also not necessary a really bad problem, but the more field names start to differ. The more you have to reach for different classes or start to pass around CDK contexts and names, it just gets confusing.
Sure, I know the answer now, but these things will be confusing for the next person.
Unexpected defaults
CDK8s provides defaults for Deployments that I don’t think should be provided. For example, they provide a default resource request and a default security context.
|
|
CPU resource limits are bad. You probably shouldn’t use them unless you know your process has a max thread count. On-top of that, CDK8s limits to 1.5 CPU cores which is not rounded and if you do have two threads running a lot, one of them is going to get throttled 50% of every second. I imagine they’re just picking some default value to help Kubernetes bin-pack, but in this case, I say you’re more likely just picking a bad number.
While enforcing read-only root file-systems is good security practice™, it’s also likely to cause a lot of broken software. So many software components I run require the ability to write temp files, etc. I’m not sure why they chose this default. Maybe they wanted to do opt-out security, which I mean if the software can run, great, but also it’s very inconsistent.
My opinion: CDK8s should use the same defaults as Kubernetes itself. If they want to provide secure, robust solutions, provide a higher level construct. I’ve seen this employed inside of Amazon for security sensitive CDK.
Conclusion
CDK8s provides some pretty useful dev productivity improvements over writing raw YAML defined resources. The built-in compile-time type-safety, code completion when in an IDE, and ability to abstract out repetitive code is a time saver.
However, it does have some down-sides. The big one is lack of the last bit of deployment tooling that actually helps this get deployed to my Kubernetes cluster.