All posts by GauntletWizard

RV journal, May 10 2025

May 11, 2025 GauntletWizard Leave a comment

I’m at the grand canyon. The plan is simple. I’m going to hike south to the end of the trail, then back to the visitors center by bus, then to the north end of the trail, and back again by bus.
I started the day off a little late – I arrived at the canyon around 11. I parked at the marketplace and picked up some postcards. This was a bad idea in retrospect; the postcards were fine, but the point was to be near the post office, and I think I’ll write the letters tonight and mail them tomorrow.
I then walked over to the visitors center, and Mather point. There, I met a nice old Scottish Mormon lady named Renee. She told me about visiting people she’d been on mission trips with, forty years ago.
Having started my journey right, I got to hiking. I headed down the rim trail, walking slowly and taking in the sights. I got all the way to the south kaibab trailhead, then followed Google’s stupid directions and walked to yaki point. From there, I took the bus back to the visitors center.
Walking the rim trail is beautiful. Little can compare. The grand canyon doesn’t win any gold medals. There are other canyons that are longer, deeper, wider. This one, however, is grand. There’s something in the layers, in the coloration, in the way the land curves. Is it the most awe inspiring sight. It’s graceful yet hard, gentle and firm. The small glimpse of the Colorado River you see from the rim hints at the incredible power of time and water to wash away everything.

My full photo album from the Grand Canyon trip

Personal

Real time constant backups with ZFS+Zrepl

November 14, 2024 GauntletWizard Leave a comment

This is a guide to making your home network backups seamless, secure, and awesome. It’s comparable in many ways to Apple Time Machine.

Prerequisites:
A machine in the cloud with lots of disk space
Mine comes from zfs.rent, where I sent some physical hard drives and rent a small machine with them attached.
CA Infrastructure.
Creating your own private CA is a whole topic of it’s own, one I hope to make simpler. For the short term, creating a private CA just for Zrepl is the best option. Create the key with cfssl genkey -initca zrepl-ca.json | cfssljson -bare zrepl-ca

# /etc/zrepl/zrepl-ca.json - On your backup host

{
  "CN": "Personal Zrepl Root CA",
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "US",
      "L": "Seattle",
      "O": "Thomas Hahn",
      "OU": "Zrepl",
      "ST": "Washington"
    }
  ]
}

Setup
Create certificates for your hosts. They should use the hostname as the CN, but also have appropriate Subject Alternative Names. I recommend CFSSL to do so.

Example contents of your cfssl csr json:

# /etc/zrepl/zrepl.json
{
  "CN": "timemachine",
  "hosts": [
    "timemachine.internal",
    "timemachine.zfs.rent",
    "timemachine.gauntletwizard.net"
  ],
  "key": {
    "algo": "rsa",
    "size": 4096
  },
  "names": [
    {
      "C": "US",
      "L": "Seattle",
      "O": "Thomas Hahn",
      "OU": "Zrepl",
      "ST": "Washington"
    }
  ]
}

mkdir /etc/zrepl
cfssl genkey zrepl.json | cfssljson -bare zrepl-$HOSTNAME

You should now have a zrepl-key.pem and a zrepl.csr in your /etc/zrepl folder, on each of your machines. Copy all of the .csr files to your CA for signing. Don’t touch the -key.pem files! These are your private keys, and need to be secret. Once you have the csrs on your CA’s machine, sign them. Sign them with:

host=HOSTNAME # Replace HOSTNAME with the name of each host
cfssl sign -ca ca.pem -ca-key ca-key.pem "zrepl-${host}.csr" | cfssljson -bare ${host}

Signing them with the above will leave you with a set of .pem files, named host.pem. You should verify that they signed correctly: openssl verify -CAfile ca.pem host.pem. It should print ‘host.pem: ok’. Copy these files back to their respective hosts, and rename them on that host to zrepl.crt. Also copy the ca.pem file to each host as /etc/zrepl/ca.pem

Next, set up the backup as a sink:

global:
  logging:
    # use syslog instead of stdout because it makes journald happy
    - type: syslog
      format: human
      level: warn
  monitoring:
    - type: prometheus
      listen: ':9811'


jobs:
  - name: backups
    type: sink
    root_fs: tank/backups
    recv:
      placeholder:
        encryption: inherit
    serve:
      type: tls
      listen: ":8826"
      ca: "/etc/zrepl/ca.pem"
      cert: "/etc/zrepl/zrepl.crt"
      key: "/etc/zrepl/zrepl-key.pem"
      client_cns:
        - "desktop"
        - "laptop"
        - "timemachine"

zrepl uses the CN field for disambiguation. Add each host you signed above to the client_cns section in zrepl.yaml. zrepl will create a new zfs filesystem under your root_fs for each of these client cns upon their first backup, i..e. tank/backups/desktop and so on.

Next, configure your machines as sources:

global:
  logging:
    # use syslog instead of stdout because it makes journald happy
    - type: syslog
      format: human
      level: warn
  monitoring:
    - type: prometheus
      listen: ':9811'


jobs:
  - name: desktop
    type: push
    filesystems:
      "desktop/home<": true
    send:
      encrypted: false
    connect:
      type: tls
      address: "timemachine:8826"
      ca: /etc/zrepl/ca.pem
      cert: /etc/zrepl/zrepl.crt
      key:  /etc/zrepl/zrepl-key.pem
      server_cn: "timemachine"

    snapshotting:
      type: periodic
      prefix: zrepl_
      interval: 5m
    pruning:
      keep_sender:
        - type: not_replicated
        - type: regex
          regex: ".*"
      keep_receiver:
      - type: grid
        grid: 1x1h(keep=all) | 24x1h | 30x1d | 6x30d
        regex: "^zrepl_"

Personal

Building and pushing multiple manifests

November 10, 2023 GauntletWizard Leave a comment

I’m trying to build multiarch docker images. I’d like to sand it down as much as possible. In a better world, that would mean just one simple command. It’s not.

Here’s the first error I got from my simple script:

podman build --platform linux/amd64 . -t "${TAG}-amd64"
podman build --platform linux/arm64/v8 . -t "${TAG}-arm64"
podman manifest create "$TAG" "${TAG}-amd64" "${TAG}-arm64"

Error: setting up to read manifest and configuration from "docker://account.dkr.ecr.us-east-1.amazonaws.com/image:tag": reading manifest docker://account.dkr.ecr.us-east-1.amazonaws.com/image:tag: manifest unknown: Requested image not found

This didn’t work, and it turned out the reason was quite simple if obtuse – podman manifest wants to build from the real repositories. As I hadn’t pushed those images, it couldn’t find them on the remote repository.

I spent some time searching for a solution to build images locally, then build them into a manifest, and then finally tag them. I found a couple of things that should work, but didn’t:

podman manifest add MANIFEST containers-storage:image:tag
reference "[overlay@/home/ted/.local/share/containers/storage+/run/user/1000/containers]docker.io/library/image:tag" does not resolve to an image ID: identifier is not an image

I don’t know why this didn’t work. According to the docs on transports, containers-storage is the transport we can use to inspect local images. This is somewhat consistent in behavior:

podman build image:tag
podman tag 
podman inspect image:tag
...
podman  inspect containers-storage:repo/image:tag
...
podman inspect containers-storage:image:tag
Error: no such object: "containers-storage:image:tag
podman inspect containers-storage:localhost/image:tag
...

Containers-storage somewhat works, but you have to supply a hostname, which is “localhost” for otherwise unspecified images.

Another angle I tried is building both in a singular tag with the manifest flag. This seems like it should work.

podman build --platform linux/amd64,linux/arm64/v8 . --manifest image:tag

This actually worked – I didn’t realize it at first, but it built both architectures:

podman manifest inspect image:tag
{
    "schemaVersion": 2,
    "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
    "manifests": [
        {
            "mediaType": "application/vnd.oci.image.manifest.v1+json",
            "size": 2444,
            "digest": "sha256:ea95462b074c650e6c477f8bf88bcfa0b6a021de7c550e2faca25c7f833bdc5f",
            "platform": {
                "architecture": "amd64",
                "os": "linux"
            }
        },
        {
            "mediaType": "application/vnd.oci.image.manifest.v1+json",
            "size": 2444,
            "digest": "sha256:f1eb75a71b89b3655b845acd79076bc8d640d3db8fb0f24367748fb50b2e6001",
            "platform": {
                "architecture": "arm64",
                "os": "linux",
                "variant": "v8"
            }
        }
    ]
}

However, when I pushed my image, the wrong format was downloaded on my k8s nodes:

podman push image:tag

Containers:
  loadtest:
    Container ID:  containerd://5d157712c742aa63220c34eb2b5213b0cf580a50c5768406ff434910700a2638
    Image:         image:tag
    Image ID:      image:tag@sha256:d0345fbc0ec7c38fdcbedfb90e7b21986e2e9642856e7e2a62a0591d68d48f85

A significant amount of consternation later, I realized that because I was using podman push, the image was being resolved first, and then just the one architecture was pushed (but with tag for the whole . What I needed to do instead was podman manifest push, which pushed the whole manifest and all sub-images.

Personal

Adventures in EFI boot

September 19, 2022 GauntletWizard Leave a comment

I have a server that didn’t boot on it’s own, requiring manual intervention each startup. This is not optimal for a server-type machine (though the motherboard was never intended for that purpose. This machine had originally booted via Windows, and the Bios on the motherboard would not let me set an MBR entry above an EFI entry via conventional means.

First problem I encountered – I couldn’t set EFI settings while booted in classic (MBR) Mode. This was resolvable through pretty simple means, I booted into an installer/recovery linux image. This was enough for me to be able to use efibootmgr to set up boot order priority, and move the entry it had created for the classic MBR into that position.

That was enough for me to resolve my basic issue, but I wanted to do one better – I wanted to boot via EFI on my existing Ubuntu install. Grub supports EFI and it wasn’t that hard to get installed, but there’s some gotchas. My first attempt was thus:

grub-install --target=x86_64-efi --efi-directory=/boot/efi --debug

grub-install will install to the default subdirectory /boot/grub, with the EFI directory specified separately. The EFI System Partition is a FAT32 formatted disk; That’s all that’s required.

Next, I had to create the boot entry (Grub will generally do this for you, but because I was doing my grub install from my MBR disk:

efibootmgr -c -L "ted" -l '\efi\ubuntu\grubx64.efi' -d /dev/sda -p 2

This didn’t quite work. Eventually I gave up and reinstalled from scratch. 🙂

I’ve had more fun EFI adventures – I had a motherboard that wouldn’t respect a Grub EFI image unless there was a “Windows” image around, so I had to install with grub-install --removable which creates a BOOT.EFI file, otherwise the same.

Last but not least of my recent EFI problems, I managed to partially reinstall grub – My grub modules directory was updated, but grub itself was an old version, and the system was in a crash loop because of grub.cfg loading modules. Reinstalling grub was enough to fix it.

Personal

Borg Priorities

April 27, 2022 GauntletWizard Leave a comment

https://www.cs.cmu.edu/~harchol/Papers/EuroSys20.pdf

The priority of a job helps define how the scheduler treats it. Ranges of priorities that share similar properties are referred to as tiers: • Free tier: jobs running at these lowest priorities incur no internal charges, and have no Service Level Objectives (SLOs). 2019 trace priority <= 99; 2011 trace priority bands 0 and 1. • Best-effort Batch (beb) tier: jobs running at these priorities are managed by the batch scheduler and incur low internal charges; they have no associated SLOs. 2019 trace priority 110–115; 2011 trace priority bands 2–8. • Mid-tier: jobs in this category offer SLOs weaker than those offered to production tier workloads, as well as lower internal charges. 2019 trace priority 116–119; not present in the 2011 trace. • Production tier: jobs in this category require high availability (e.g., user-facing service jobs, or daemon jobs providing storage and networking primitives); internally charged for at “full price”. Borg will evict lower-tier jobs in order to ensure production tier jobs receive their expected level of service. 2019 trace priority 120–359; 2011 trace priority bands 9–10. • Monitoring tier: jobs we deem critical to our infrastructure, including ones that monitor other jobs for problems. 2019 trace priority >= 360; 2011 trace priority band 11. (We merged the small number of monitoring jobs into the Production tier for this paper.)
https://www.cs.cmu.edu/~harchol/Papers/EuroSys20.pdf

https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43438.pdf

2.5 Priority, quota, and admission control
What happens when more work shows up than can be accommodated? Our solutions for this are priority and quota.
Every job has a priority, a small positive integer. A highpriority task can obtain resources at the expense of a lowerpriority one, even if that involves preempting (killing) the
latter. Borg defines non-overlapping priority bands for different uses, including (in decreasing-priority order): monitoring, production, batch, and best effort (also known as
testing or free). For this paper, prod jobs are the ones in the
monitoring and production bands.

Personal

Upgrading PHP on Ubuntu

August 1, 2018 GauntletWizard Leave a comment

One of the weirdities that I have on my personal server is that my public facing site – www.gauntletwizard.net – is served from my personal `~/public-html/` folder. PHP is disabled from these folders by default, for good reason, but that reason is to keep PHP out of the hands of randos and I’m careful about who’s on my machine.

Anyway – There’s a stanza in /etc/apache2/mods-enabled/php-[7].conf that begins with `Running PHP scripts in user directories is disabled by default` – Do as it says and comment that section out.

Personal

Delete keys in redis non-atomically

April 25, 2018 GauntletWizard Leave a comment

There’s a lot of information out there about how to atomically delete a sequence of keys in Redis. That’s great, if you want to cause your production cluster to block for minutes at a time while you do so. If you’ve want to delete a bunch of keys with a scan, though, there’s less info.

redis-cli does support a --scan flag, which combined with a --pattern flag allows you to asynchronously list a set of prefixed keys – Like the keys command, except without causing your redis server to block. You can then use this output to feed an xargs command.

For example: redis-cli --scan -h "${REDISHOST}" --pattern "PATTERN" | tee keys | xargs redis-cli -h "${REDISHOST}" del | tee deletions

Personal

Prometheus alerting and questions

April 20, 2018 GauntletWizard Leave a comment

I’ve been switching my company over to Prometheus, and I’ve come across a few things that need discussion and opinions.

First, concrete advice:
Don’t just write an alert like
“`
alert: foo
expr: sum(rate(bar[5m])) > 5
“`
Write it so you record the rate, and then alert on that metric:
“`
record: bar:rate
expr: sum(rate(bar[5m]))
alert: foo
expr: bar:rate > 5
“`

From my Google days, I can say I should probably specify what the time is on that rate.

Questions:
1) How long should the rate window be? [5m]? [2m]? 3? 10?
* I’ve adopted 5m as standard across my company, being a compromise between being fast-moving and not overly smoothed
2) How long should alert `for`s be?
3) Metric naming
* I’m using `A_Metric_Name`; Not sure if this is right
4) Recorded rule naming
* I like `product:metric[:submetric]:unit` ; eg. houseparty:websockets_open:byDeviceType:sum

Personal

Kubernetes Build best practices

February 26, 2018 GauntletWizard Leave a comment

1) Squash your builds
This is now part of default docker, but it was well worth it even before. Docker will create a new tarball for each `stage` – Each ADD, RUN, etc creates a new layer that, by default, you upload. This means if you add secret material and then delete it – you haven’t really deleted it. More commonly, it bloats your image sizes. A couple intermediate files can be a huge pain, and waste your time and bandwidth uploading.

Don’t squash down to a single, monolithic image – Pick a good base point. Having a fully-featured image as a base layer is not a sin – So long as you reuse it, it doesn’t take up any more space or download time, so your lightweight squashed build can build on top of it.

2) Use Multistage builds
Your build environment should be every bit as much a container as your output. Don’t build your artifacts in your local machine and then add them to your images – You’re likely polluting your output with local state more than you know. Deterministic builds require you to understand the state of the build machine and make sure it doesn’t leak, and containers are a wonderful tool for that.

Alternatively:
Just use Bazel. Bazel’s https://github.com/bazelbuild/rules_docker is pretty simple to use, powerful, and generates docker-compatible images without actually running docker.

Personal

Migrating a SBT project to Bazel.

July 5, 2017 GauntletWizard Leave a comment

I’ve been working today on migrating a SBT project to Bazel. I’ve taken a few wrong turns, and I’ll document them later, but this will be my working doc and I’ll add some failures to the end.

Two major components – Bazel’s generate_workspace tool, and SBT’s make-pom command. You’ll create a POM file with the dependencies and repos.
ted:growth$ sbt make-pom [warn] Executing in batch mode. [warn] For better performance, hit [ENTER] to switch to interactive mode, or [warn] consider launching sbt without any commands, or explicitly passing 'shell' [info] Loading project definition from /Users/ted/dev/growth/project SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [info] Set current project to growth (in build file:/Users/ted/dev/growth/) [warn] Multiple resolvers having different access mechanism configured with same name 'Artifactory-lib'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolver (`publishTo`). [info] Wrote /Users/ted/dev/growth/target/scala-2.11/growth_2.11-resurrector-9449dfb1de3b816c5fd74c4948f16496b38952ab.pom [success] Total time: 5 s, completed Jun 14, 2017 4:00:17 PM
This generates a pom file, but not exactly as generate_workspace wants it. It requires a directory with a pom.xml, so go ahead and turn that into one by making a tempdir and copying the file to it TMPDIR="$(mktemp -d)"; cp /Users/ted/dev/growth/target/scala-2.11/growth_2.11-resurrector-9449dfb1de3b816c5fd74c4948f16496b38952ab.pom "${TMPDIR}/pom.xml"

Next, build

So, on to the failures:
I initially tried to do my own workspace code generation. I took the output of sbt libraryDependencies and turned it into mvn_jar stanzas via script. This didn’t work, for the simple reason that I wasn’t doing it transitively, they mention that in the generate_workspace docs. I also tried specifying that list of deps as a big list of –archive stanzas; That turned out to be a mistake, mostly because of alternate repos. I also had to clean out a broken SBT set of repos; bazel does not play well with repeated repo definitions, while SBT is happy to ignore them.

Wyld Stallyns

All posts by GauntletWizard

RV journal, May 10 2025

Real time constant backups with ZFS+Zrepl

Building and pushing multiple manifests

Adventures in EFI boot

Borg Priorities

Upgrading PHP on Ubuntu

Delete keys in redis non-atomically

Prometheus alerting and questions

Kubernetes Build best practices

Migrating a SBT project to Bazel.

Ted's Excellent Adventure.