In Bazel you put files into Rules and get files out, e.g:
pkg_tar( name = "package", extension = "tar.gz", srcs = [:file1, :file2] )
The pkg_tar rule takes :file1 and :file2 and spits out the tarball :package.tar.gz, which you can pass to another rule as input. So you input files into rules that output files into other rules, that output files into other rules… until eventually you get the file you want. This is a graph that can look like:
Part of the github.com/sorbet/sorbet rule graph
Rules are made up of many actions. Actions are just scripts, usually written in python or bash (because most systems can execute them). Like rules, actions take files in and spit files out.
The output of an action should only depend on the explicitly stated inputs. That is actions should be hermetic, isolated from all but explicit dependencies. This is Bazel’s broader philosophy and most of its design decisions are consequences of this.
If every action is hermetic then we can get a giant benefit; speed! We can run an action once cache its output and don’t need to run it again until it’s inputs change. Also, with explicit inputs and outputs Bazel can construct a graph of all actions to calculate efficient and parallel execution. Bazel does this in three phases:
- Loading Phase: Load all the rules.
- Analysis Phase: Calculates the action graph, and hash inputs to look up in the cache and see what needs to be run.
- Execution Phase: Process the necessary actions.
Separating the Execution and Analysis phase means we have to register the actions to let Bazel decide when to run them. For example, the action to create a tarball in [pkg_tar](https://github.com/bazelbuild/bazel/blob/master/tools/build_defs/pkg/pkg.bzl#L120) looks like:
ctx.actions.run( executable = ctx.executable.build_tar, inputs = file_inputs + ctx.files.deps + [arg_file], arguments = ["--flagfile", arg_file.path], outputs = [ctx.outputs.out], )
This code is in the python-esk language Starlark, that comes with specific limitations and different core libraries to encourage hermetic actions. A quick explanation of this code is:
[ctx](https://docs.bazel.build/versions/master/skylark/lib/ctx.html)is the context of the rule.ctx.actionsis how you register an action to a rule.ctx.actions.runis an action that calls a script.executableis a reference to thebuild_tarscript to be run.inputsare all files needed to run this script.argumentsare sent to theexecutable.outputsare all files this action generates.
By explicitly specifying the inputs, outputs and using a hermetic script to run, Bazel can build large projects very fast.
Hermetic Tools
But wait, what the hell is _build_tar_ executable above? Also those arguments won’t work with _tar_?
build_tar is a python reimplementation of tar. Why would Bazel need to reimplement _tar_? tar is not hermetic. Try this out:
$> tar -cz file1 | sha256sum d0f...44b $> tar -cz file1 | sha256sum ee2...777
tar’s output changes as it attaches a created at date so depends on something not explicitly stated as an input. This can be worked around, but it depends on the version of tar on the system, an undeclared dependency.
To make this hermetic build_tar sets the date to the “implausibly old” time stamp 1970-01-01 00:00:00. This means we can’t (easily) use tar in an action, even if it has been a common tool for 40 years!.
This is where Bazel really starts to lose people. Many existing tools are not hermetic so Bazel can’t reliably cache their output. To fix this we need to reimplement these tools from scratch. For example…
Down the Rabbit Hole with Docker
Docker is a great tool. However, previously I have described how docker build isn’t hermetic even if the built images are identical. Containers are a reality of modern development so how are we meant to use Bazel to build them?
Use [**rules_docker**](https://github.com/bazelbuild/rules_docker) the re-implementation of **docker build** in Bazel. Containers aren’t anything more than tarballs with a manifest. rules_docker contains rules and actions for squashing tarballs together to create new containers.
Typically the first RUN command in a Dockerfile is apt-get update && apt-get install ..., how are we going to _apt-get_ hermetically?.
Use distroless and its re-implementation of apt-get/dpkg. apt-get update downloads a list of tarball deb packages, which can be selected and extracted with apt-get install. Distroless contains rules to smash deb tarballs into container tarballs using an unchanging Debian repository snapshot to make sure we always get the same versions.
Some deb packages like ca-certificates have installer scripts, how are we going to deal with non-hermetic installer scripts?
Re-implement them as Bazel rules. For example the ca-certificates install script is re-implemented with the [cacert](https://github.com/GoogleContainerTools/distroless/blob/master/cacerts/cacerts.bzl) rule and [extract.sh](https://github.com/GoogleContainerTools/distroless/blob/master/cacerts/extract.sh) script in distroless. This can be quite some work and require knowledge that is typically abstracted with apt-get.
Falling down this rabbit hole causes us to throw away many existing tools and reimplement them hermetically.

Up to you
There are many more issues with Bazel (laid out here) but I think most stem from Bazel’s core philosophy of hermetic builds. The speed and reliability are undeniable, but so is the pain when you throw away a tool you like and have to reimplement it Bazel’s way.
I liken Bazel to Haskell. Both have rigid, inflexible philosophies. Bazel with being hermetic and Haskell being functionally pure. This can immediately turn people off as they have to throw away well known tools like docker or for loops. The few who do stick around can come out of the fire with a deeper understanding of the tradeoffs they are making.
At the end of the day, if the benefits outweigh the downsides maybe Bazel is the tool for you. I won’t recommend it, like I wouldn’t recommend Haskell, because it is up to you if you will trade pain for speed.
