Section

3 pages

Series

Guaranteed Quality of Service in my Home Lab

A few times in my Kubernetes clusters, I’ve encountered situations where some process consumes all the CPU or RAM which starves other services for critical services. For example, in one situation, Longhorn consumed all CPU and RAM and my pi-hole running on the same machine stopped being able to process DNS requests. Other issues have included having to shut down one of my worker nodes and the other nodes not having enough capacity to take on pods and important pods not getting scheduled or even a mistake when I changed the pod selector labels and Kubernetes just spawned thousands of pods.

The graph below shows Disk I/O of a node with excessive disk writes because the OS is swapping RAM out to desk and back. A graph of Disk I/O showing large amounts of disk I/O as the host swaps RAM to disk, then it finally fails

My home lab servers are now running what I consider to be “business critical” services and I don’t want those to be impacted. Kubernetes has several different knobs we can use to improve this such as leveraging Linux’s cgroups to ensure that specific pods get a certain amount of CPU and RAM. It also supports prioritization, so that certain pods get scheduled and others get evicted if there isn’t enough space.

Or even lately, I’ve been hitting the max pod limit of 110 pods on my single-node cluster. Not everything is important and I want to make sure certain cron jobs always run even if I’m running some low-priority jobs. Turns out it is possible to be running 110 different pods.

Better Vault for Postgres access in my Home Lab

In my previous post on Vault, I showed how Hashicorp’s Vault can be used to protect important passwords, static passwords that don’t change frequently. Vault can do much more than this and can even automatically create temporary accounts and rotate passwords for database users.

Today, I’m using long-lived passwords that I generate once when I add a new service, I, along with most people, just insert those passwords into the environment like this:

1
2
3
4
5
6
spec:
  containers:
  - env:
    - name: DATABASE_URL
      value: >-
      postgresql://username:mypassword@postgres:5432/database

That’s not secure at all. While you can store them in Kubernetes Secrets, they’re not encrypted by default. Kubernetes can encrypt secrets, but they’re open to anybody with access to the cluster. The passwords are easily accessible to anybody with access to Kubernetes and are never rotated. This simply won’t do. In this post, I’m going to walk through how I switch to Vault for

Git pushes can be surprising

I was recently working on an open source project (tryfi/hass-tryfi - A Home Assistant integration for pulling data from my dog’s collar using the TryFi API and I found out that Git pushes can behave in a surprising way after I accidentally pushed a bunch of testing commits to the wrong branch.

A COE on why technowizardry.net went down

COE = Correction of Error

My previous employer, Amazon, was a big proponent of doing blameless analysis of outages and figuring out what could be done to fix it. I recently had an outage on my servers and wanted to share what went wrong and the fix.

Summary

Starting Thursday until Friday, all TLS requests to a *.technowizardry.net domain would have failed due to a TLS certificate expiration error. Then on Friday, all DNS queries to a *.technowizardry.net zone failed which also caused mail delivery to fail too. This happened because cert-manager had created the acme-challenge TXT record, but the record was not visible to the Internet because the HE DNS was failing to perform an AXFR Zone Transfer from my authoritative DNS server. This was because PowerDNS was unable to bind to port :53 because systemd-resolved was already listening on that port.

Over-engineering my Home Assistant HVAC Dashboard

Ever wondered how well your HVAC system is working in your home or condo? I did to an unhealthy degree. I want to know not just what’s the temperature, but how often is it running, what’s the supply and return temperatures, etc.? Let’s overengineer another project.