GitHub’s “Squash and Merge” doesn’t Squash and doesn’t Merge! Trade-offs with Merging
At Coinbase we use a GitHub-flow-ish workflow to collaborate on code and develop features:
- Create a branch
- Create/Edit/Delete/Rename files on branch
- Create a Pull Request (PR) merging the branch into
master - Get code reviewed and make requested changes
- Merge the PR
However, we are still having discussions around the final step. How should we merge pull requests?
The options are:
- Merge the pull request,
git merge - Manually squash the branch with
git rebaseorgit reset, force push, then merge - Use the “Squash and Merge” function in GitHub, basically
git --squash merge branch
The core differences between these methods is how much friction it is for developers to use, and what historical information is left behind.
How much friction does each method add to the developers work flow? Any method should not make a developers job unreasonably hard or annoying, otherwise people will find work-arounds or just not follow any standards.
Using git blame and git show and other code archeology tools to see the context of a change can help a developer understand why a bug exists and how to fix it. The historic record of what happened is stored in both git and GitHub. What crumbs of information left, where they exist, and how accurate they are can make the difference between an hours work or a week.
Experimental Repository
Below is a bash function that creates a repository to experiment with:
create_example_repository() {
# To ensure the same commit shas on all repositories
export DATE="2017-01-01T01:01:01"
export GIT_COMMITTER_DATE=$DATE# Initializing the repository with a README
git init
echo "# Repository" > README.md
echo "This is an example repository\n" >> README.md
git add README.md
git commit -m 'first commit' --date=$DATE
# Creating an example branch or PR
git co -b new-branch
echo "## Branch" >> README.md
git add README.md
git commit -m 'branch' --date=$DATE
# This branch has multiple commits
echo "this is an example branch" >> README.md
git add README.md
git commit -m 'fix tests' --date=$DATE
git co master
# To simulate distributed work, add another commit to master
echo "# Repository" > README.md
echo "This is an example repository" >> README.md
echo "That many people work on\n" >> README.md
git add README.md
git commit -m 'commit' --date=$DATE
}
This will repeatedly create the exact repository including the commit SHA’s (will be different per person because of the committer email).
With git log --graph --oneline --all this repository shows:


Merge
create_example_repository
git merge --no-commit new-branch
git commit -m 'typical merging' --date=$DATE


This method leaves in the git history a record of exactly what happened. This is not useful though, as every mistake or misstep in the merged branch remains. This in turn makes git blame and git show less useful.
However, this is the easiest method to use. GitHub directly support merging, and it requires no extra work meaning most projects start out using this method. As the codebase grows and more people work on it, running git blame and seeing fix tests on half the lines will become annoying.
Manual Squash and Merge
create_example_repository
git co new-branch
git reset $(git merge-base master new-branch)
git add README.md
git commit -m 'squashed' --date=$DATE
git co master
git merge --no-commit new-branch
git commit -m 'squashed merge' --date=$DATE


This is the most difficult method to use as GitHub doesn’t directly support squashing a branch, so you have to become comfortable with git rebase and/or git reset. Also, force pushing the branch to GitHub can wipe out useful information like comments, making it difficult to look back and see why decisions were made.
git history is very clean though, where git blame and git show have accurate and concise branch and merge information.
This method is preferred by people who are comfortable with git and want a clean history. To reduce friction you have to use scripts, and this can make a difficult situation even more confusing. If we want other non-technical teams (like design, legal, and compliance) to contribute in our workflows having such a technical strategy can add difficulty.
Squash and Merge
create_example_repository
git merge --squash new-branch
git commit -m 'squashing done' --date=$DATE


In the git merge man page it says that--squash creates a state “as if a real merge happened (except for the merge information)”. This is because it does not squash the branch, and it also doesn’t create a merge commit. The “Squash and Merge” function from GitHub replicates this, and is why it doesn’t squash or merge.
Not having all the information makes the git history difficult to read and it is difficult to find where a commit came from. Information like when code was written (not merged), who wrote the code (not merged the PR) is impossible to find without also going to GitHub. This can make tools designed for git difficult to integrate, as they must also retrieve information from GitHub.
GitHub information, like the pull request title and number, are added to the in the commit message. Also, the git history is super clean, if only used then it looks like there are no branches. So it is very easy to use and creates a super clean (if inaccurate) history.
Summary
- Merge: History messy but accurate. Very Easy
- Manual Squash then Merge: History is clean. Very Difficult.
- Squash and Merge: Clean but inaccurate history, fragmented between
gitand GitHub. Easy.
My personal bias is towards the simple merge, because I am comfortable with git history dumpster diving for information. But I typically work on projects with few developers, and I am too lazy to manually squash every branch.
On larger projects, where we are trying to get buy-in from other teams, I would lean towards squash-and-merge as some inaccuracy is a good exchange for ease of use and clean history.