How does go build compile the simplest Golang program? This post is here to answer that question.
The simplest go program (I can think of) is main.go:
package main``func main() {}
If we run go build main.go it outputs an executable main that is 1.1Mb and does nothing. What did go build do to do create such a useful binary?
go build has some args that are useful for seeing how it builds:
**-work**:go buildcreates a temporary folder for work files. This arg will print out the location of that folder and not delete it after the build**-a**: Golang caches previously built packages.-amakesgo buildignore the cache so our build will print all steps**-p 1**: This sets the concurrency to a single thread to log output linear**-x**:go buildis a wrapper around other Golang tools likecompile.-xoutputs the commands and arguments that are sent to these tools
Running go build -work -a -p 1 -x main.go will output not only the main binary, but a lot of logs describing exactly what build did to create main.
The logs starts with:
WORK=/var/folders/rw/gtb29xf92fv23f0zqsg42s840000gn/T/go-build940616988
This is the work directory whose structure looks like:
├── b001 │ ├── _pkg_.a │ ├── exe │ ├── importcfg │ └── importcfg.link ├── b002 │ └── ... ├── b003 │ └── ... ├── b004 │ └── ... ├── b006 │ └── ... ├── b007 │ └── ... └── b008 └── ...
What are these incrementing directory numbers?
go build defines an action graph of tasks that need to be completed. Each action in this graph gets its own sub-directory (defined in [NewObjdir](https://github.com/golang/go/blob/master/src/cmd/go/internal/work/action.go#L318)). The first node b001 in the graph is the root task to compile the main binary. Each dependent action has a higher number, the final being b008. (I don’t know where b005 went, I assume its ok)The first action to be executed is the leaf of the graph, b008:
`mkdir -p $WORK/b008/
cat >$WORK/b008/importcfg « ‘EOF’
import config
EOFcd /<..>/src/runtime/internal/sys /<..>/compile -o $WORK/b008/_pkg_.a -trimpath "$WORK/b008=>" -p runtime/internal/sys -std -+ -complete -buildid gEtYPexVP43wWYWCxFKi/gEtYPexVP43wWYWCxFKi -goversion go1.14.7 -D "" -importcfg $WORK/b008/importcfg -pack -c=16 ./arch.go ./arch_amd64.go ./intrinsics.go ./intrinsics_common.go ./stubs.go ./sys.go ./zgoarch_amd64.go ./zgoos_darwin.go ./zversion.go/<..>/buildid -w $WORK/b008/pkg.a
cp $WORK/b008/pkg.a /<..>/Caches/go-build/01/01b…60a-d`
The b008 action:
- creates the action directory (all actions do this so I ignore this later on)
- creates the
importcfgfile to be used by thecompiletool (it is empty) - changes the directory to the
[runtime/internal/sys](https://golang.org/pkg/runtime/internal/sys/)packages source folder. This package containsconstants used by the runtime compilethis package- Use
buildidto write (-w) metadata to the package and copy the package to thego-buildcache (all packages are cached so I ignore this later on)
Let’s break this down the arguments sent to the compile tool (also described in go tool compile --help):
-ois the output file-trimpaththis removes the prefix from the source file paths$WORK/b008=>(probably helps with debugging?)-psets the package path used byimport-stdcompiling standard library(not sure what this does)-+compiling runtime(another mystery)-completethe compiler outputs a complete package (no C or assembly).-buildidadds build id to the metadata (as defined here)-goversionrequired version for compiled package-Dthe relative path for local imports is""-importcfgimport configuration file refers to other packages-packcreate package archive (.a) instead of object file (.o)-cconcurrency of the build- finished with a list of files in the package
Most of these arguments are the same for all _compile_ calls, so I ignore them later.
The output of **b008** is the file **$WORK/b008/_pkg_.a** **** for **runtime/internal/sys**Let’s dive into buildid for a second.
The buildid is in the format <actionid>/<contentid>. It is used as an index to cache packages to improve go build performance. The <actionid> is the hash of the action (all calls, arguments, and input files). The <contentid> is a hash of the output .a file. For each go build action, it can look up in the cache for contents created by another action with the same <actionid>. This is implemented in buildid.go.
The buildid is stored as metadata in the file so that it does not need to be hashed every time to get the <contentid>. You can see this id with go tool buildid <file> (also works on binaries).
In the log of b008 above the buildID is being set in by the compile tool as gEtYPexVP43wWYWCxFKi/gEtYPexVP43wWYWCxFKi. This is a just a place holder and is later overwritten with go tool buildid -w to the correct gEtYPexVP43wWYWCxFKi/b-rPboOuD0POrlJWPTEi before being cached.The next action to be run is b007:
`cat >$WORK/b007/importcfg « ‘EOF’
import config
packagefile runtime/internal/sys=$WORK/b008/pkg.a
EOF
cd /<..>/src/runtime/internal/math
/<..>/compile
-o $WORK/b007/pkg.a
-p runtime/internal/math
-importcfg $WORK/b007/importcfg
…
./math.go`
- This writes the
importcfgbut it includes the linepackagefile runtime/internal/sys=$WORK/b008/_pkg_.a. This meansb007depends on the output ofb008 compile’s the[runtime/internal/math](https://golang.org/pkg/runtime/internal/math/)package. If you inspect[math.go](https://golang.org/src/runtime/internal/math/math.go), it hasimport "runtime/internal/sys"built byb008
The output of **b007** is the file **$WORK/b007/_pkg_.a** **** for **runtime/internal/math**The next action is b006:
`cat >$WORK/b006/go_asm.h « ‘EOF’
EOF
cd /<..>/src/runtime/internal/atomic
/<..>/asm
-I $WORK/b006/
-I /<..>/go/1.14.7/libexec/pkg/include
-D GOOS_darwin
-D GOARCH_amd64
-gensymabis
-o $WORK/b006/symabis
./asm_amd64.s/<..>/asm -I $WORK/b006/ -I /<..>/go/1.14.7/libexec/pkg/include -D GOOS_darwin -D GOARCH_amd64 -o $WORK/b006/asm_amd64.o ./asm_amd64.scat >$WORK/b006/importcfg « ‘EOF’
import config
EOF
/<..>/compile
-o $WORK/b006/pkg.a
-p runtime/internal/atomic
-symabis $WORK/b006/symabis
-asmhdr $WORK/b006/go_asm.h
-importcfg $WORK/b006/importcfg
…
./atomic_amd64.go ./stubs.go``/<..>/pack r $WORK/b006/pkg.a $WORK/b006/asm_amd64.o`
Here is where we step out of the normal .go files and start dealing with lower level “Go assembly” .s files. b006:
- First this makes the header file
go_asm.h - goes to the
[runtime/internal/atomic](https://golang.org/pkg/runtime/internal/atomic/)package (a bunch of low-level functions). - runs the
[go tool asm](https://golang.org/cmd/asm/)tool (described withgo tool asm --help) to build thesymabis“Symbol Application Binary Interfaces (ABI) file” and then the object fileasm_amd64.o - Uses
compilecreate the_pkg_.afile including thesymabisfile and the header with-asmhdr. - Uses
packto add theasm_amd64.oobject file to_pkg_.apackage archive
The asm tool is called with the args:
-I: include the actionb007andincludesfolders.includeshas three filesasm_ppc64x.hfuncdata.handtextflag.hall having low level function definitions, e.g.FIXED_FRAME defines the size of the fixed part of a stack frame-D: Adds a predefined symbol-gensymabis: flag to generate thesymabisfile-o: The output file
The output of **b006** is **$WORK/b006/_pkg_.a** **** for **runtime/internal/atomic**Next is b004:
cd /<..>/src/internal/cpu /<..>/asm ... -o $WORK/b004/symabis ./cpu_x86.s``/<..>/asm ... -o $WORK/b004/cpu_x86.o ./cpu_x86.s``/<..>/compile ... -o $WORK/b004/_pkg_.a ./cpu.go ./cpu_amd64.go ./cpu_x86.go``/<..>/pack r $WORK/b004/_pkg_.a $WORK/b004/cpu_x86.o
b004 is the same as b006 for the package [internal/cpu](https://golang.org/pkg/internal/cpu/). First we we assemble the symabis and object files, then compile the go files and pack the .o files into _pkg_.a.
The output of **b004** is **$WORK/b004/_pkg_.a** **** for **internal/cpu**The next action is b003
`cat >$WORK/b003/go_asm.h « ‘EOF’
EOF
cd /<..>/src/internal/bytealg/<..>/asm ... -o $WORK/b003/symabis ./compare_amd64.s ./count_amd64.s ./equal_amd64.s ./index_amd64.s ./indexbyte_amd64.scat >$WORK/b003/importcfg « ‘EOF’
import config
packagefile internal/cpu=$WORK/b004/pkg.a
EOF
/<..>/compile … -o $WORK/b003/pkg.a -p internal/bytealg ./bytealg.go ./compare_native.go ./count_native.go ./equal_generic.go ./equal_native.go ./index_amd64.go ./index_native.go ./indexbyte_native.go/<..>/asm ... -o $WORK/b003/compare_amd64.o ./compare_amd64.s /<..>/asm ... -o $WORK/b003/count_amd64.o ./count_amd64.s /<..>/asm ... -o $WORK/b003/equal_amd64.o ./equal_amd64.s /<..>/asm ... -o $WORK/b003/index_amd64.o ./index_amd64.s /<..>/asm ... -o $WORK/b003/indexbyte_amd64.o ./indexbyte_amd64.s/<..>/pack r $WORK/b003/pkg.a $WORK/b003/compare_amd64.o $WORK/b003/count_amd64.o $WORK/b003/equal_amd64.o $WORK/b003/index_amd64.o $WORK/b003/indexbyte_amd64.o`
b003 is the same as the previous actions b004 b006 for the package [internal/bytealg](https://golang.org/pkg/internal/bytealg/). The main complication with this package is that there are multiple .s files to create many .o object files that each need to be added to the _pkg_.a file.
The output of **b003** is **$WORK/b003/_pkg_.a** **** for **internal/bytealg**The penultimate action, b002:
`cat >$WORK/b002/go_asm.h « ‘EOF’
EOF
cd /<..>/src/runtime
/<..>/asm
…
-o $WORK/b002/symabis
./asm.s ./asm_amd64.s ./duff_amd64.s ./memclr_amd64.s ./memmove_amd64.s ./preempt_amd64.s ./rt0_darwin_amd64.s ./sys_darwin_amd64.s
cat >$WORK/b002/importcfg « ‘EOF’
import config
packagefile internal/bytealg=$WORK/b003/pkg.a
packagefile internal/cpu=$WORK/b004/pkg.a
packagefile runtime/internal/atomic=$WORK/b006/pkg.a
packagefile runtime/internal/math=$WORK/b007/pkg.a
packagefile runtime/internal/sys=$WORK/b008/pkg.a
EOF``/<..>/compile
-o $WORK/b002/pkg.a
…
-p runtime
./alg.go ./atomic_pointer.go ./cgo.go ./cgocall.go ./cgocallback.go ./cgocheck.go ./chan.go ./checkptr.go ./compiler.go ./complex.go ./cpuflags.go ./cpuflags_amd64.go ./cpuprof.go ./cputicks.go ./debug.go ./debugcall.go ./debuglog.go ./debuglog_off.go ./defs_darwin_amd64.go ./env_posix.go ./error.go ./extern.go ./fastlog2.go ./fastlog2table.go ./float.go ./hash64.go ./heapdump.go ./iface.go ./lfstack.go ./lfstack_64bit.go ./lock_sema.go ./malloc.go ./map.go ./map_fast32.go ./map_fast64.go ./map_faststr.go ./mbarrier.go ./mbitmap.go ./mcache.go ./mcentral.go ./mem_darwin.go ./mfinal.go ./mfixalloc.go ./mgc.go ./mgcmark.go ./mgcscavenge.go ./mgcstack.go ./mgcsweep.go ./mgcsweepbuf.go ./mgcwork.go ./mheap.go ./mpagealloc.go ./mpagealloc_64bit.go ./mpagecache.go ./mpallocbits.go ./mprof.go ./mranges.go ./msan0.go ./msize.go ./mstats.go ./mwbbuf.go ./nbpipe_pipe.go ./netpoll.go ./netpoll_kqueue.go ./os_darwin.go ./os_nonopenbsd.go ./panic.go ./plugin.go ./preempt.go ./preempt_nonwindows.go ./print.go ./proc.go ./profbuf.go ./proflabel.go ./race0.go ./rdebug.go ./relax_stub.go ./runtime.go ./runtime1.go ./runtime2.go ./rwmutex.go ./select.go ./sema.go ./signal_amd64.go ./signal_darwin.go ./signal_darwin_amd64.go ./signal_unix.go ./sigqueue.go ./sizeclasses.go ./slice.go ./softfloat64.go ./stack.go ./string.go ./stubs.go ./stubs_amd64.go ./stubs_nonlinux.go ./symtab.go ./sys_darwin.go ./sys_darwin_64.go ./sys_nonppc64x.go ./sys_x86.go ./time.go ./time_nofake.go ./timestub.go ./trace.go ./traceback.go ./type.go ./typekind.go ./utf8.go ./vdso_in_none.go ./write_err.go
/<..>/asm … -o $WORK/b002/asm.o ./asm.s
/<..>/asm … -o $WORK/b002/asm_amd64.o ./asm_amd64.s
/<..>/asm … -o $WORK/b002/duff_amd64.o ./duff_amd64.s
/<..>/asm … -o $WORK/b002/memclr_amd64.o ./memclr_amd64.s
/<..>/asm … -o $WORK/b002/memmove_amd64.o ./memmove_amd64.s
/<..>/asm … -o $WORK/b002/preempt_amd64.o ./preempt_amd64.s
/<..>/asm … -o $WORK/b002/rt0_darwin_amd64.o ./rt0_darwin_amd64.s
/<..>/asm … -o $WORK/b002/sys_darwin_amd64.o ./sys_darwin_amd64.s
/<..>/pack r $WORK/b002/pkg.a $WORK/b002/asm.o $WORK/b002/asm_amd64.o $WORK/b002/duff_amd64.o $WORK/b002/memclr_amd64.o $WORK/b002/memmove_amd64.o $WORK/b002/preempt_amd64.o $WORK/b002/rt0_darwin_amd64.o $WORK/b002/sys_darwin_amd64.o`
b002 is the reason for all actions seen so far. It is the [**runtime**](https://golang.org/pkg/runtime/) package containing all the operations needed for a go binary to run. For example, it contains [mgc.go](https://golang.org/src/runtime/mgc.go) the implementation of the garbage collection in Go (that also imports both internal/cpu from b004 and runtime/internal/atomic from b006).
b002 although probably the most complex package in the core library, is built using the same pattern we have seen before, it just contains files. It uses asm compile and pack to build _pkg_.a.
The output of **b002** is **$WORK/b002/_pkg_.a** **** for **runtime**The final action, the one that pulls everything together, is b001:
`cat >$WORK/b001/importcfg « ‘EOF’
import config
packagefile runtime=$WORK/b002/pkg.a
EOF
cd /<..>/main/<..>/compile ... -o $WORK/b001/_pkg_.a -p main ./main.gocat >$WORK/b001/importcfg.link « ‘EOF’
packagefile command-line-arguments=$WORK/b001/pkg.a
packagefile runtime=$WORK/b002/pkg.a
packagefile internal/bytealg=$WORK/b003/pkg.a
packagefile internal/cpu=$WORK/b004/pkg.a
packagefile runtime/internal/atomic=$WORK/b006/pkg.a
packagefile runtime/internal/math=$WORK/b007/pkg.a
packagefile runtime/internal/sys=$WORK/b008/pkg.a
EOF/<..>/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link -buildmode=exe -buildid=yC-qrh2sY_qI0zh2-NE7/owNzOBTqPO00FkqK0_lF/HPXqvMz_4PvKsQzqGWgD/yC-qrh2sY_qI0zh2-NE7 -extld=clang $WORK/b001/_pkg_.amv $WORK/b001/exe/a.out main`
- First it builds an
importcfgthat includesruntimebuilt inb002to thencompilemain.goto_pkg_.a - Then it creates
importcfg.linkwhich includes all previous actions packages, pluscommand-line-argumentsreferencing themainpackage we built. Using[link](https://golang.org/cmd/link/)to then create an executable file - rename and move the binary to
main
link has the new arguments:
-buildmode: set to build an executable-extld: reference to the external linker
Finally, we have the output we want; the output of **b001** is the **main** binary.#### Similarities with Bazel
The building of an action graph in order to have efficient caching is the same idea the build tool Bazel uses for fast builds. Golang’s actionid and contentid map neatly to the action cache and the content-addressable store (CAS) Bazel uses in caching. Bazel is a product of Google, so is Golang. It would make sense that they would have a similar philosophy of how to build software quickly and reliably.
In Bazel’s rules_go package you can see how it reimplements go build in its [builder](https://github.com/bazelbuild/rules_go/tree/master/go/tools/builders) code. This is a very clean implementation because the action graph, the folder management, and the caching are handled externally by Bazel.
The Next Steps
go build does a lot to compile a program that does nothing! I didn’t even get into much specific detail about the tools (compile asm) or their inputs and output files ( .a .o .s). Also, we are still only compiling the most basic program. We could add complications like:
- importing another package, e.g. using
fmtto printHello Worldadds another 23 actions to the action graph - having a
go.modfile referencing external packages - Setting
GOOSandGOARCHto other architectures, e.g. compiling to WASM has entirely different actions and arguments
Running go build and inspecting logs is a very top-down approach to learning how the Golang compiler works. It is a great starting point to dive into more resources like:
- Introduction to the Go compiler
- Go: Overview of the Compiler
- Go at Google: Language Design in the Service of Software Engineering
- Source code like
[**build.go**](https://github.com/golang/go/blob/master/src/cmd/go/internal/work/build.go)**** the definition of thego buildcommand, or[**compile/main.go**](https://github.com/golang/go/blob/master/src/cmd/compile/main.go)**** the entry point togo tool compile
There is a lot of information out there so still lots to learn about compiling the simplest program.