Symlinks are (not) hard.

I’ve got two amusing anecdotes related to symlinks. By amusing anecdotes, I of course mean incredibly frustrating weird behaviors that took hours to debug. One java, one chef.

Chef

Chef handles environments very well… except when it comes to databags. From my perspective, this is a critical flaw, since the things I want to keep out of the main chef repo (API keys and passwords) are also the things most likely to be affected by the environment. So,  when building, we specify the path to the chef databags, separating out the prod, canary, and dev environments.

For the parts that are common between the databags, I figured I’d use symlinks. Our databags are stored in a git repo, and git interprets symlinks correctly. The full set of databags were copied everywhere, so I could simply include a relative symlink to ../../prod/foo/bar.json for each databag I wanted consistent.  I got the following error:

syntax error, unexpected end-of-input

pointing to a character in the middle of the first line in the file. This made no sense.

It took me several tries with different files to figure out what was going on. The character that was being pointed out, x, was the same as the number of characters in the symlink path. A symlink is sorta just a text file with a pathname and a special flag on it. If you stat the symlink file, you’ll get the length of that pathname, not the size of the file it points to. What Chef seems to be doing is stat-ing that file, then taking that length as gospel – It doesn’t process it as a stream, but as a block of the stat’d size.

I should probably get around to testing that with the latest version and writing a bug.

Java

Java has a really simple package deployment mechanism: JARs. You can put a bunch of classes into a jar, and deploy them as one. If you have a project with a bunch of dependencies, you can ‘shade’ your jar and wrap all your classes into a single mono-jar.

However, for some use cases it’s not that simple. Java up to 1.7 simply won’t accept more than INT_16_MAX class files in a jar (and remember that anonymous classes are a separate file). Further, signatures can’t be retained; A jar has a signing key attached, and all files must be signed using that same signing key, so a ‘shaded’ jar can’t include the original signatures of dependencies.

So, since monolithic jars don’t work in some cases, what do you do instead? You ship several jars. It’s well documented but not well understood that when you specify a jar with java -jar that your classpath is ignored. How do you load multiple jars, then?

Inside the jar is a META-INF folder containing a MANIFEST.MF file. This manifest file contains a bunch of key-value pairs, and one of those keys can be Class-Path. This class-path key can specify additional jars or directories, and it usually will. However, because of deployment concerns, it will generally list them as relative paths or just as filenames. How does java find those files?

In about the worst way possible. Java will dereference any symlinks in the jar it is loading, then search the base directory of the final file it reads for the class-path includes. So, if you have a bunch of projects with common includes, you cannot simply symlink in all your dependency jars; You need hard copies of every jar you include. This also means you can’t simply update a dependency jar in one place, you have to hard-link it in to the working directory of every app you want to deploy.

I guess an option is to simply have a big folder full of all the jars for all the apps you want to run, but that folder can get very cluttered, and it becomes unclear what’s there why – is one of your dependencies shared? Do you have a garbage-collection mechanism for older jars in that folder?