Do you want to see something annoying? Create a Dockerfile like this: FROM ubuntu:18.04@sha256:9b1...b3c````RUN echo "Hello World"

Now build it without cache twice: $> docker build --no-cache -q . sha256:4e0...20b $> docker build --no-cache -q . sha256:e28...3fa

Why are the SHA digests different? They should be exactly the same… right? The file system didn’t change, they are both built in the same environment.

To solve this mystery lets take a look inside each of these images: $> docker save sha256:4e0...20b > a.tar $> docker save sha256:e28...3fa > b.tar````$> mkdir a $> mkdir b````$> tar -xzvf a.tar -C a $> tar -xzvf b.tar -C b

The directories a and b now look like: ``# layer folders
├── 478…d90/
│ ├── json
│ ├── layer.tar
│ └── VERSION
├── 8ea…b12/
├── d2e…fcd/
├── eda…695````# config file
├── 4e0…20b.json

reference to config file and layers files

└── manifest.json``

Config

Docker’s JSON config file describes the environment that built the docker image and its history: { "architecture": "amd64", "config": { ... }, "container": "2e7...b3e", "container_config": { ... }, "created": "2019-07-10T07:49:21.1663546Z", "docker_version": "18.09.2", "history": [ { "created": "2019-06-18T22:51:33.33427803Z", "created_by": "/bin/sh -c #(nop) ADD file:4e6...098 in / " }, ... { "created": "2019-07-10T07:49:21.1663546Z", "created_by": "/bin/sh -c echo \"Hello World\"", "empty_layer": true } ], "os": "linux", "rootfs": { "type": "layers", "diff_ids": [ "sha256:ba9...51e", "sha256:fbd...ad6", "sha256:dda...d0b", "sha256:75e...5ca" ] } }

Manifest

The manifest.json file describes the location of the layers and config file: [ { "Config": "4e0...20b.json", "RepoTags": null, "Layers": [ "8ea...b12/layer.tar", "d2e...fcd/layer.tar", "478...d90/layer.tar", "eda...695/layer.tar" ] } ]

Layers

Each layer has a json file (which looks like the config file), a VERSION file with the string 1.0 (probably the packaging version), and a layer.tar file containing the images files.

The a and b images have different layer folders even though the layer.tar’s are exactly the same: $> sha256sum a/eda...695/layer.tar 75e...5ca a/eda...695/layer.tar````$> sha256sum b/4ed...42d/layer.tar 75e...5ca b/4ed...42d/layer.tar

This layer.tar SHA is referenced inside the config file, and the location of the layer is in the manifest.

The Digestive System

Lets SHA256 the config files: $> sha256sum a/4e0...20b.json 4e0...20b a/4e0...20b.json $> sha256sum b/e28...3fa.json e28...3fa b/e28...3fa.json

So this is where the digest comes from. So what is different between the a and b images, we can see with diff : ``$> diff a/4e0…20b.json b/e28…3fa.json
27c27
< “container”: “2e7…b3e”,

“container”: “97a…49c”,
54c54
< “created”: “2019-07-10T07:49:21.1663546Z”,


“created”: “2019-07-10T07:49:30.0860002Z”,
79c79
< “created”: “2019-07-10T07:49:21.1663546Z”,


  "created": "2019-07-10T07:49:30.0860002Z",``

The created timestamps and the container keys causes the digests to be different. This is annoying because even two identical docker images will have a different digests if built milliseconds apart.

Breaking the Config Digest

Let’s see if we can remove the differences between these images and create a new digest. Can we create a valid docker image if we remove the container key (not sure we need this) and change all the dates 1970-01-01T00:00:00Z in the config file?

Also do the names of the files matter? I want to call the config file config.json and rename the layer folders to 1,2,3,4. Both require updates to the references in the manifest.json file and the layer json files.

Now the folder looks like: # layer folders ├── 1/ ├── 2/ ├── 3/ ├── 4/ ├── config.json └── manifest.json

Lets re-tar and docker load again: $> tar -cf new-a.tar -C a/ . $> docker load -i new-a.tar Loaded image ID: sha256:24b...975

The digest equals the SHA256 of the config.json file, so this all looks correct and it runs!

Dockers Digest

  1. The docker image’s digest is the SHA256 of its config file.
  2. It is different on different builds because the timestamps (which can be replaced) and the container key (which can be ignored).
  3. The file names and folder structure don’t matter.
  4. The manifest file references the layers and config, but is not included in the digest.

I can now understand and remove some of the apparent randomness in docker image digests. Ultimately what I want is to create a workflow where if a Dockerfile creates the same filesystem it has the same digest. Knowing how a digest is created and being able to manipulate it is the first step.

image