All posts by GauntletWizard

Upgrading PHP on Ubuntu

One of the weirdities that I have on my personal server is that my public facing site – – is served from my personal `~/public-html/` folder. PHP is disabled from these folders by default, for good reason, but that reason is to keep PHP out of the hands of randos and I’m careful about who’s on my machine.

Anyway – There’s a stanza in /etc/apache2/mods-enabled/php-[7].conf that begins with `Running PHP scripts in user directories is disabled by default` – Do as it says and comment that section out.

Delete keys in redis non-atomically

There’s a lot of information out there about how to atomically delete a sequence of keys in Redis. That’s great, if you want to cause your production cluster to block for minutes at a time while you do so. If you’ve want to delete a bunch of keys with a scan, though, there’s less info.

redis-cli does support a --scan flag, which combined with a --pattern flag allows you to asynchronously list a set of prefixed keys – Like the keys command, except without causing your redis server to block. You can then use this output to feed an xargs command.

For example: redis-cli --scan -h "${REDISHOST}" --pattern "PATTERN" | tee keys | xargs redis-cli -h "${REDISHOST}" del | tee deletions

Prometheus alerting and questions

I’ve been switching my company over to Prometheus, and I’ve come across a few things that need discussion and opinions.

First, concrete advice:
Don’t just write an alert like
alert: foo
expr: sum(rate(bar[5m])) > 5
Write it so you record the rate, and then alert on that metric:
record: bar:rate
expr: sum(rate(bar[5m]))
alert: foo
expr: bar:rate > 5

From my Google days, I can say I should probably specify what the time is on that rate.

1) How long should the rate window be? [5m]? [2m]? 3? 10?
* I’ve adopted 5m as standard across my company, being a compromise between being fast-moving and not overly smoothed
2) How long should alert `for`s be?
3) Metric naming
* I’m using `A_Metric_Name`; Not sure if this is right
4) Recorded rule naming
* I like `product:metric[:submetric]:unit` ; eg. houseparty:websockets_open:byDeviceType:sum

Kubernetes Build best practices

1) Squash your builds
This is now part of default docker, but it was well worth it even before. Docker will create a new tarball for each `stage` – Each ADD, RUN, etc creates a new layer that, by default, you upload. This means if you add secret material and then delete it – you haven’t really deleted it. More commonly, it bloats your image sizes. A couple intermediate files can be a huge pain, and waste your time and bandwidth uploading.

Don’t squash down to a single, monolithic image – Pick a good base point. Having a fully-featured image as a base layer is not a sin – So long as you reuse it, it doesn’t take up any more space or download time, so your lightweight squashed build can build on top of it.

2) Use Multistage builds
Your build environment should be every bit as much a container as your output. Don’t build your artifacts in your local machine and then add them to your images – You’re likely polluting your output with local state more than you know. Deterministic builds require you to understand the state of the build machine and make sure it doesn’t leak, and containers are a wonderful tool for that.

Just use Bazel. Bazel’s is pretty simple to use, powerful, and generates docker-compatible images without actually running docker.

Migrating a SBT project to Bazel.

I’ve been working today on migrating a SBT project to Bazel. I’ve taken a few wrong turns, and I’ll document them later, but this will be my working doc and I’ll add some failures to the end.

Two major components – Bazel’s generate_workspace tool, and SBT’s make-pom command. You’ll create a POM file with the dependencies and repos.

ted:growth$ sbt make-pom
[warn] Executing in batch mode.
[warn] For better performance, hit [ENTER] to switch to interactive mode, or
[warn] consider launching sbt without any commands, or explicitly passing 'shell'
[info] Loading project definition from /Users/ted/dev/growth/project
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See for further details.
[info] Set current project to growth (in build file:/Users/ted/dev/growth/)
[warn] Multiple resolvers having different access mechanism configured with same name 'Artifactory-lib'. To avoid conflict, Remove duplicate project resolvers (`resolvers`) or rename publishing resolver (`publishTo`).
[info] Wrote /Users/ted/dev/growth/target/scala-2.11/growth_2.11-resurrector-9449dfb1de3b816c5fd74c4948f16496b38952ab.pom
[success] Total time: 5 s, completed Jun 14, 2017 4:00:17 PM

This generates a pom file, but not exactly as generate_workspace wants it. It requires a directory with a pom.xml, so go ahead and turn that into one by making a tempdir and copying the file to it TMPDIR="$(mktemp -d)"; cp /Users/ted/dev/growth/target/scala-2.11/growth_2.11-resurrector-9449dfb1de3b816c5fd74c4948f16496b38952ab.pom "${TMPDIR}/pom.xml"

Next, build

So, on to the failures:
I initially tried to do my own workspace code generation. I took the output of sbt libraryDependencies and turned it into mvn_jar stanzas via script. This didn’t work, for the simple reason that I wasn’t doing it transitively, they mention that in the generate_workspace docs. I also tried specifying that list of deps as a big list of –archive stanzas; That turned out to be a mistake, mostly because of alternate repos. I also had to clean out a broken SBT set of repos; bazel does not play well with repeated repo definitions, while SBT is happy to ignore them.


The big companies I’ve worked at have all had been using security policies. The small companies haven’t. Frequently, all access to production machines have been controlled by a single shared ssh key. This sucks, but is inevitable, given the lack of time to spend on tooling. However, there are some low-cost toolings to make this better.

The basic developer workflow has been – Type in a command, which will generate a SSH certificate, then ask you for your password and u2f auth, and it’ll talk to the central signing server and get that cert signed. This is surprisingly doable for a small org – BLESS and CURSE are two alternatives.

For myself, though, the right thing to do is run ssh-agent. ssh-agent allows you to keep your keys in memory, and can support several keys. It also allows for forwarding the auth socket to a remote host – So if you need to ssh through a bastion host, you don’t have to copy your SSH key to the bastion machine, it can live on your local drive and all authentication requests can go through it. ssh -A enables this forwarding.

The other problem I’ve encountered a few times is that I want to share my ssh-agent across several terminals. This can be a blessing or a curse, but on most of my machines I only have one or two keys, and while I want them encrypted at-rest I don’t care if they’re loaded in memory a bunch. I’ve written the shell script that does this a bunch, and I today asked myself why it’s not in the default ssh toolkit (like ssh-copy-id). Well, it’s not, but there is a tool that does what I’m looking for: Keychain, not to be confused with the OSX tool of the same name. Though, to my surprise, OSX *already has this functionality*; My default terminal opens up with an SSH_AUTH_SOCK already populated, and it’s managed by the system. That’s pretty cool.

Annotated git config.

# Much saner than the old behavior, and new default.
default = simple
# Duh.
email =
name = Ted Hahn
# Corresponsed to my signing key.
signingkey = 1CA0948A
# When pulling, rebase my feature branches on top of what they’ve just pulled.
rebase = true
# Sign all commits
gpgsign = true

Bash tips.

Here’s some things you should start most bash scripts with:


set -e
set -x
set -o pipefail
set -u

TMPDIR=$(mktemp -d)
trap 'rm -rf $TMPDIR' EXIT

Explanations of the lines:


The shebang line is a unix convention that allows scripts to specify their interpreter. Since this is a bash script, we tell it to run this file with bash.

set -e

Exit immediately if any command fails. Makes it easy to spot when a script did not complete, and prevents things further down the line from doing the wrong thing because they were only partially setup.

set -x

Print each command as it’s run. It’s fantastically useful debug output, though some production scripts should have this disabled.

set -o pipefail

Exit with failure if any substage of a pipeline fails. This is about commands chained together with a pipe; e.g. If your grep command fails, the execution will fail, rather than simply outputting nothing to the next stage of the pipeline.

set -u

Makes referencing unset variables an error.

Further explaination of the above three can be found in the Bash Reference Manual entry on Set.

TMPDIR=$(mktemp -d)
trap 'rm -rf $TMPDIR' EXIT

Create a scratch dir, automatically delete it when you’re done. It’s often useful to comment out the trap line during debugging.

See also Pixelbeat’s blog on Common shell script mistakes

Symlinks are (not) hard.

I’ve got two amusing anecdotes related to symlinks. By amusing anecdotes, I of course mean incredibly frustrating weird behaviors that took hours to debug. One java, one chef.


Chef handles environments very well… except when it comes to databags. From my perspective, this is a critical flaw, since the things I want to keep out of the main chef repo (API keys and passwords) are also the things most likely to be affected by the environment. So,  when building, we specify the path to the chef databags, separating out the prod, canary, and dev environments.

For the parts that are common between the databags, I figured I’d use symlinks. Our databags are stored in a git repo, and git interprets symlinks correctly. The full set of databags were copied everywhere, so I could simply include a relative symlink to ../../prod/foo/bar.json for each databag I wanted consistent.  I got the following error:

syntax error, unexpected end-of-input

pointing to a character in the middle of the first line in the file. This made no sense.

It took me several tries with different files to figure out what was going on. The character that was being pointed out, x, was the same as the number of characters in the symlink path. A symlink is sorta just a text file with a pathname and a special flag on it. If you stat the symlink file, you’ll get the length of that pathname, not the size of the file it points to. What Chef seems to be doing is stat-ing that file, then taking that length as gospel – It doesn’t process it as a stream, but as a block of the stat’d size.

I should probably get around to testing that with the latest version and writing a bug.


Java has a really simple package deployment mechanism: JARs. You can put a bunch of classes into a jar, and deploy them as one. If you have a project with a bunch of dependencies, you can ‘shade’ your jar and wrap all your classes into a single mono-jar.

However, for some use cases it’s not that simple. Java up to 1.7 simply won’t accept more than INT_16_MAX class files in a jar (and remember that anonymous classes are a separate file). Further, signatures can’t be retained; A jar has a signing key attached, and all files must be signed using that same signing key, so a ‘shaded’ jar can’t include the original signatures of dependencies.

So, since monolithic jars don’t work in some cases, what do you do instead? You ship several jars. It’s well documented but not well understood that when you specify a jar with java -jar that your classpath is ignored. How do you load multiple jars, then?

Inside the jar is a META-INF folder containing a MANIFEST.MF file. This manifest file contains a bunch of key-value pairs, and one of those keys can be Class-Path. This class-path key can specify additional jars or directories, and it usually will. However, because of deployment concerns, it will generally list them as relative paths or just as filenames. How does java find those files?

In about the worst way possible. Java will dereference any symlinks in the jar it is loading, then search the base directory of the final file it reads for the class-path includes. So, if you have a bunch of projects with common includes, you cannot simply symlink in all your dependency jars; You need hard copies of every jar you include. This also means you can’t simply update a dependency jar in one place, you have to hard-link it in to the working directory of every app you want to deploy.

I guess an option is to simply have a big folder full of all the jars for all the apps you want to run, but that folder can get very cluttered, and it becomes unclear what’s there why – is one of your dependencies shared? Do you have a garbage-collection mechanism for older jars in that folder?

On Monorepos vs Project repos

I’ve seen some talk about whether to keep everything in one codebase, vs having per-project repositories. The answer is very clear to me: Monolithic repos are a must, but Git submodules are functionally equivalent (as I’ll describe later); You should start with one repo, and then subdivide when you have clear submodules.

The importance of monolithic repos vs per project is not about performance, or even directly about organization of your code. Both are fairly clear. It’s about organization of your build system. Good builds are fully deterministic and idempotent, and that is very hard to achieve with a set of per-project repositories.

The unix standard has a very good layout for where code goes. Shared libraries go in /lib, /usr/lib, /usr/local/lib; Headers go in /usr/include or /usr/local/include. But this isn’t a structure for how to organize your code when writing it; It’s a build environment structure. It makes more sense when your entire organization is sharing just one unix machine and environment, but we’re well beyond that.

Because you’re aiming for determinism, you need to be sure that your build environment is the same each time. The traditional unix file structure is not good for this purpose, at least, not directly. That structure is not rebuilt every time; You copy files on top of it, you add and remove, but you don’t reset. You can accomplish a reset – Reimaging your build machine, or using a docker container. And, in fact, the latter is what many people have switched to. But that docker container is another layer of abstraction, another piece you need to manage for full determinism. You need to kill and rebuild it each build, and most solutions don’t. The run new builds in the same container repeatedly, creating uncertainty.

This is why I like Bazel so much. It removes much of the uncertainty, by rebuilding from scratch each time. It has a separate, well defined environment, that it manages and assures is in the correct state. It’s not magic; You can change things, break things, and fool it. But if you don’t touch it, it does *the right thing* and keeps your structure clean, without taking any risks of breaking your whole machine.

Bazel operates on a concept called a ‘workspace’. There’s not a whole lot to it; You pick an arbitrary root directory, and a flag file defines the workspace root. Everything underneath is considered one logical unit. If you’ve got a monolithic codebase, this is a no-brainer.

Git submodules complicate builds a little, but not much. Instead of saying “this build was built at commit A”, you need to know that it was built at commits (A, B, C). But you probably don’t really care to always know the full set of A, B, C; They may move somewhat independently, but your build infrastructure can and should simply serialize them; Keeping a mapping of simple, linear commit numbers to a tuple of commit hashes for each submodule. There’s one disadvantage – It makes ‘who broke the build’ a race condition, if two modules change at the same time. That is solved by a simple answer: Suck it up, both commiters should debug 🙂