How “go build” Works
How does go build
compile the simplest Golang program? This post is here to answer that question.
The simplest go program (I can think of) is main.go
:
package mainfunc main() {}
If we run go build main.go
it outputs an executable main
that is 1.1Mb and does nothing. What did go build
do to do create such a useful binary?
go build
has some args that are useful for seeing how it builds:
-work
:go build
creates a temporary folder for work files. This arg will print out the location of that folder and not delete it after the build-a
: Golang caches previously built packages.-a
makesgo build
ignore the cache so our build will print all steps-p 1
: This sets the concurrency to a single thread to log output linear-x
:go build
is a wrapper around other Golang tools likecompile
.-x
outputs the commands and arguments that are sent to these tools
Running go build -work -a -p 1 -x main.go
will output not only the main
binary, but a lot of logs describing exactly what build
did to create main
.
The logs starts with:
WORK=/var/folders/rw/gtb29xf92fv23f0zqsg42s840000gn/T/go-build940616988
This is the work directory whose structure looks like:
├── b001
│ ├── _pkg_.a
│ ├── exe
│ ├── importcfg
│ └── importcfg.link
├── b002
│ └── ...
├── b003
│ └── ...
├── b004
│ └── ...
├── b006
│ └── ...
├── b007
│ └── ...
└── b008
└── ...
What are these incrementing directory numbers?
go build
defines an action graph of tasks that need to be completed. Each action in this graph gets its own sub-directory (defined in NewObjdir
). The first node b001
in the graph is the root task to compile the main
binary. Each dependent action has a higher number, the final being b008
. (I don’t know where b005
went, I assume its ok)
The first action to be executed is the leaf of the graph, b008
:
mkdir -p $WORK/b008/
cat >$WORK/b008/importcfg << 'EOF'
# import config
EOFcd /<..>/src/runtime/internal/sys
/<..>/compile
-o $WORK/b008/_pkg_.a
-trimpath "$WORK/b008=>"
-p runtime/internal/sys
-std
-+
-complete
-buildid gEtYPexVP43wWYWCxFKi/gEtYPexVP43wWYWCxFKi
-goversion go1.14.7
-D ""
-importcfg $WORK/b008/importcfg
-pack
-c=16
./arch.go ./arch_amd64.go ./intrinsics.go ./intrinsics_common.go ./stubs.go ./sys.go ./zgoarch_amd64.go ./zgoos_darwin.go ./zversion.go/<..>/buildid -w $WORK/b008/_pkg_.a
cp $WORK/b008/_pkg_.a /<..>/Caches/go-build/01/01b...60a-d
The b008
action:
- creates the action directory (all actions do this so I ignore this later on)
- creates the
importcfg
file to be used by thecompile
tool (it is empty) - changes the directory to the
runtime/internal/sys
packages source folder. This package containsconstants used by the runtime
compile
this package- Use
buildid
to write (-w
) metadata to the package and copy the package to thego-build
cache (all packages are cached so I ignore this later on)
Let’s break this down the arguments sent to the compile
tool (also described in go tool compile --help)
:
-o
is the output file-trimpath
this removes the prefix from the source file paths$WORK/b008=>
(probably helps with debugging?)-p
sets the package path used byimport
-std
compiling standard library
(not sure what this does)-+
compiling runtime
(another mystery)-complete
the compiler outputs a complete package (no C or assembly).-buildid
adds build id to the metadata (as defined here)-goversion
required version for compiled package-D
the relative path for local imports is""
-importcfg
import configuration file refers to other packages-pack
create package archive (.a
) instead of object file (.o
)-c
concurrency of the build- finished with a list of files in the package
Most of these arguments are the same for all compile
calls, so I ignore them later.
The output of b008
is the file $WORK/b008/_pkg_.a
for runtime/internal/sys
Let’s dive into buildid
for a second.
The buildid is in the format <actionid>/<contentid>
. It is used as an index to cache packages to improve go build
performance. The <actionid>
is the hash of the action (all calls, arguments, and input files). The <contentid>
is a hash of the output .a
file. For each go build
action, it can look up in the cache for contents created by another action with the same <actionid>
. This is implemented in buildid.go.
The buildid
is stored as metadata in the file so that it does not need to be hashed every time to get the <contentid>
. You can see this id with go tool buildid <file>
(also works on binaries).
In the log of b008
above the buildID is being set in by the compile
tool as gEtYPexVP43wWYWCxFKi/gEtYPexVP43wWYWCxFKi
. This is a just a place holder and is later overwritten with go tool buildid -w
to the correct gEtYPexVP43wWYWCxFKi/b-rPboOuD0POrlJWPTEi
before being cached.
The next action to be run is b007
:
cat >$WORK/b007/importcfg << 'EOF'
# import config
packagefile runtime/internal/sys=$WORK/b008/_pkg_.a
EOF
cd /<..>/src/runtime/internal/math
/<..>/compile
-o $WORK/b007/_pkg_.a
-p runtime/internal/math
-importcfg $WORK/b007/importcfg
...
./math.go
- This writes the
importcfg
but it includes the linepackagefile runtime/internal/sys=$WORK/b008/_pkg_.a
. This meansb007
depends on the output ofb008
compile
’s theruntime/internal/math
package. If you inspectmath.go
, it hasimport "runtime/internal/sys"
built byb008
The output of b007
is the file $WORK/b007/_pkg_.a
for runtime/internal/math
The next action is b006
:
cat >$WORK/b006/go_asm.h << 'EOF'
EOF
cd /<..>/src/runtime/internal/atomic
/<..>/asm
-I $WORK/b006/
-I /<..>/go/1.14.7/libexec/pkg/include
-D GOOS_darwin
-D GOARCH_amd64
-gensymabis
-o $WORK/b006/symabis
./asm_amd64.s/<..>/asm
-I $WORK/b006/
-I /<..>/go/1.14.7/libexec/pkg/include
-D GOOS_darwin
-D GOARCH_amd64
-o $WORK/b006/asm_amd64.o
./asm_amd64.scat >$WORK/b006/importcfg << 'EOF'
# import config
EOF
/<..>/compile
-o $WORK/b006/_pkg_.a
-p runtime/internal/atomic
-symabis $WORK/b006/symabis
-asmhdr $WORK/b006/go_asm.h
-importcfg $WORK/b006/importcfg
...
./atomic_amd64.go ./stubs.go/<..>/pack r $WORK/b006/_pkg_.a $WORK/b006/asm_amd64.o
Here is where we step out of the normal .go
files and start dealing with lower level “Go assembly” .s
files. b006
:
- First this makes the header file
go_asm.h
- goes to the
runtime/internal/atomic
package (a bunch of low-level functions). - runs the
go tool asm
tool (described withgo tool asm --help
) to build thesymabis
“Symbol Application Binary Interfaces (ABI) file” and then the object fileasm_amd64.o
- Uses
compile
create the_pkg_.a
file including thesymabis
file and the header with-asmhdr.
- Uses
pack
to add theasm_amd64.o
object file to_pkg_.a
package archive
The asm
tool is called with the args:
-I
: include the actionb007
andincludes
folders.includes
has three filesasm_ppc64x.h
funcdata.h
andtextflag.h
all having low level function definitions, e.g.FIXED_FRAME defines the size of the fixed part of a stack frame
-D
: Adds a predefined symbol-gensymabis
: flag to generate thesymabis
file-o
: The output file
The output of b006
is $WORK/b006/_pkg_.a
for runtime/internal/atomic
Next is b004
:
cd /<..>/src/internal/cpu
/<..>/asm ... -o $WORK/b004/symabis ./cpu_x86.s/<..>/asm ... -o $WORK/b004/cpu_x86.o ./cpu_x86.s/<..>/compile ... -o $WORK/b004/_pkg_.a ./cpu.go ./cpu_amd64.go ./cpu_x86.go/<..>/pack r $WORK/b004/_pkg_.a $WORK/b004/cpu_x86.o
b004
is the same as b006
for the package internal/cpu
. First we we assemble the symabis
and object files, then compile the go files and pack the .o
files into _pkg_.a
.
The output of b004
is $WORK/b004/_pkg_.a
for internal/cpu
The next action is b003
cat >$WORK/b003/go_asm.h << 'EOF'
EOF
cd /<..>/src/internal/bytealg/<..>/asm ... -o $WORK/b003/symabis ./compare_amd64.s ./count_amd64.s ./equal_amd64.s ./index_amd64.s ./indexbyte_amd64.scat >$WORK/b003/importcfg << 'EOF'
# import config
packagefile internal/cpu=$WORK/b004/_pkg_.a
EOF
/<..>/compile ... -o $WORK/b003/_pkg_.a -p internal/bytealg ./bytealg.go ./compare_native.go ./count_native.go ./equal_generic.go ./equal_native.go ./index_amd64.go ./index_native.go ./indexbyte_native.go/<..>/asm ... -o $WORK/b003/compare_amd64.o ./compare_amd64.s
/<..>/asm ... -o $WORK/b003/count_amd64.o ./count_amd64.s
/<..>/asm ... -o $WORK/b003/equal_amd64.o ./equal_amd64.s
/<..>/asm ... -o $WORK/b003/index_amd64.o ./index_amd64.s
/<..>/asm ... -o $WORK/b003/indexbyte_amd64.o ./indexbyte_amd64.s/<..>/pack r $WORK/b003/_pkg_.a $WORK/b003/compare_amd64.o $WORK/b003/count_amd64.o $WORK/b003/equal_amd64.o $WORK/b003/index_amd64.o $WORK/b003/indexbyte_amd64.o
b003
is the same as the previous actions b004
b006
for the package internal/bytealg
. The main complication with this package is that there are multiple .s
files to create many .o
object files that each need to be added to the _pkg_.a
file.
The output of b003
is $WORK/b003/_pkg_.a
for internal/bytealg
The penultimate action, b002
:
cat >$WORK/b002/go_asm.h << 'EOF'
EOF
cd /<..>/src/runtime
/<..>/asm
...
-o $WORK/b002/symabis
./asm.s ./asm_amd64.s ./duff_amd64.s ./memclr_amd64.s ./memmove_amd64.s ./preempt_amd64.s ./rt0_darwin_amd64.s ./sys_darwin_amd64.s
cat >$WORK/b002/importcfg << 'EOF'
# import config
packagefile internal/bytealg=$WORK/b003/_pkg_.a
packagefile internal/cpu=$WORK/b004/_pkg_.a
packagefile runtime/internal/atomic=$WORK/b006/_pkg_.a
packagefile runtime/internal/math=$WORK/b007/_pkg_.a
packagefile runtime/internal/sys=$WORK/b008/_pkg_.a
EOF/<..>/compile
-o $WORK/b002/_pkg_.a
...
-p runtime
./alg.go ./atomic_pointer.go ./cgo.go ./cgocall.go ./cgocallback.go ./cgocheck.go ./chan.go ./checkptr.go ./compiler.go ./complex.go ./cpuflags.go ./cpuflags_amd64.go ./cpuprof.go ./cputicks.go ./debug.go ./debugcall.go ./debuglog.go ./debuglog_off.go ./defs_darwin_amd64.go ./env_posix.go ./error.go ./extern.go ./fastlog2.go ./fastlog2table.go ./float.go ./hash64.go ./heapdump.go ./iface.go ./lfstack.go ./lfstack_64bit.go ./lock_sema.go ./malloc.go ./map.go ./map_fast32.go ./map_fast64.go ./map_faststr.go ./mbarrier.go ./mbitmap.go ./mcache.go ./mcentral.go ./mem_darwin.go ./mfinal.go ./mfixalloc.go ./mgc.go ./mgcmark.go ./mgcscavenge.go ./mgcstack.go ./mgcsweep.go ./mgcsweepbuf.go ./mgcwork.go ./mheap.go ./mpagealloc.go ./mpagealloc_64bit.go ./mpagecache.go ./mpallocbits.go ./mprof.go ./mranges.go ./msan0.go ./msize.go ./mstats.go ./mwbbuf.go ./nbpipe_pipe.go ./netpoll.go ./netpoll_kqueue.go ./os_darwin.go ./os_nonopenbsd.go ./panic.go ./plugin.go ./preempt.go ./preempt_nonwindows.go ./print.go ./proc.go ./profbuf.go ./proflabel.go ./race0.go ./rdebug.go ./relax_stub.go ./runtime.go ./runtime1.go ./runtime2.go ./rwmutex.go ./select.go ./sema.go ./signal_amd64.go ./signal_darwin.go ./signal_darwin_amd64.go ./signal_unix.go ./sigqueue.go ./sizeclasses.go ./slice.go ./softfloat64.go ./stack.go ./string.go ./stubs.go ./stubs_amd64.go ./stubs_nonlinux.go ./symtab.go ./sys_darwin.go ./sys_darwin_64.go ./sys_nonppc64x.go ./sys_x86.go ./time.go ./time_nofake.go ./timestub.go ./trace.go ./traceback.go ./type.go ./typekind.go ./utf8.go ./vdso_in_none.go ./write_err.go
/<..>/asm ... -o $WORK/b002/asm.o ./asm.s
/<..>/asm ... -o $WORK/b002/asm_amd64.o ./asm_amd64.s
/<..>/asm ... -o $WORK/b002/duff_amd64.o ./duff_amd64.s
/<..>/asm ... -o $WORK/b002/memclr_amd64.o ./memclr_amd64.s
/<..>/asm ... -o $WORK/b002/memmove_amd64.o ./memmove_amd64.s
/<..>/asm ... -o $WORK/b002/preempt_amd64.o ./preempt_amd64.s
/<..>/asm ... -o $WORK/b002/rt0_darwin_amd64.o ./rt0_darwin_amd64.s
/<..>/asm ... -o $WORK/b002/sys_darwin_amd64.o ./sys_darwin_amd64.s
/<..>/pack r $WORK/b002/_pkg_.a $WORK/b002/asm.o $WORK/b002/asm_amd64.o $WORK/b002/duff_amd64.o $WORK/b002/memclr_amd64.o $WORK/b002/memmove_amd64.o $WORK/b002/preempt_amd64.o $WORK/b002/rt0_darwin_amd64.o $WORK/b002/sys_darwin_amd64.o
b002
is the reason for all actions seen so far. It is the runtime
package containing all the operations needed for a go binary to run. For example, it contains mgc.go
the implementation of the garbage collection in Go (that also imports
both internal/cpu
from b004
and runtime/internal/atomic
from b006
).
b002
although probably the most complex package in the core library, is built using the same pattern we have seen before, it just contains files. It uses asm
compile
and pack
to build _pkg_.a
.
The output of b002
is $WORK/b002/_pkg_.a
for runtime
The final action, the one that pulls everything together, is b001
:
cat >$WORK/b001/importcfg << 'EOF'
# import config
packagefile runtime=$WORK/b002/_pkg_.a
EOF
cd /<..>/main/<..>/compile ... -o $WORK/b001/_pkg_.a -p main ./main.gocat >$WORK/b001/importcfg.link << 'EOF'
packagefile command-line-arguments=$WORK/b001/_pkg_.a
packagefile runtime=$WORK/b002/_pkg_.a
packagefile internal/bytealg=$WORK/b003/_pkg_.a
packagefile internal/cpu=$WORK/b004/_pkg_.a
packagefile runtime/internal/atomic=$WORK/b006/_pkg_.a
packagefile runtime/internal/math=$WORK/b007/_pkg_.a
packagefile runtime/internal/sys=$WORK/b008/_pkg_.a
EOF/<..>/link
-o $WORK/b001/exe/a.out
-importcfg $WORK/b001/importcfg.link
-buildmode=exe
-buildid=yC-qrh2sY_qI0zh2-NE7/owNzOBTqPO00FkqK0_lF/HPXqvMz_4PvKsQzqGWgD/yC-qrh2sY_qI0zh2-NE7
-extld=clang
$WORK/b001/_pkg_.amv $WORK/b001/exe/a.out main
- First it builds an
importcfg
that includesruntime
built inb002
to thencompile
main.go
to_pkg_.a
- Then it creates
importcfg.link
which includes all previous actions packages, pluscommand-line-arguments
referencing themain
package we built. Usinglink
to then create an executable file - rename and move the binary to
main
link
has the new arguments:
-buildmode
: set to build an executable-extld
: reference to the external linker
Finally, we have the output we want; the output of b001
is the main
binary.
Similarities with Bazel
The building of an action graph in order to have efficient caching is the same idea the build tool Bazel uses for fast builds. Golang’s actionid
and contentid
map neatly to the action cache
and the content-addressable store (CAS)
Bazel uses in caching. Bazel is a product of Google, so is Golang. It would make sense that they would have a similar philosophy of how to build software quickly and reliably.
In Bazel’s rules_go
package you can see how it reimplements go build
in its builder
code. This is a very clean implementation because the action graph, the folder management, and the caching are handled externally by Bazel.
The Next Steps
go build
does a lot to compile a program that does nothing! I didn’t even get into much specific detail about the tools (compile
asm
) or their inputs and output files ( .a
.o
.s
). Also, we are still only compiling the most basic program. We could add complications like:
- importing another package, e.g. using
fmt
to printHello World
adds another 23 actions to the action graph - having a
go.mod
file referencing external packages - Setting
GOOS
andGOARCH
to other architectures, e.g. compiling to WASM has entirely different actions and arguments
Running go build
and inspecting logs is a very top-down approach to learning how the Golang compiler works. It is a great starting point to dive into more resources like:
- Introduction to the Go compiler
- Go: Overview of the Compiler
- Go at Google: Language Design in the Service of Software Engineering
- Source code like
build.go
the definition of thego build
command, orcompile/main.go
the entry point togo tool compile
There is a lot of information out there so still lots to learn about compiling the simplest program.