While developing the Good Enough Recommendations (GER) engine, I needed to Upsert a record in Postgres. Upsert is a function that updates a record if it exists, or inserts the record if it doesn’t. However, Postgres doesn’t come with upsert out-of-the-box, so I had to find out how best to implement it.
In this post, I will describe two methods to upsert records in Postgres, multi-query and single-statement. Then I will describe how I compared them to select a method for GER to use.
Note: this post uses CoffeeScript, and Q promises
GER’s Upsert Action
GER is a recommendations engine implemented in Node.js using Knex.js to query Postgres. GER’s API has the method set_action_weight(action,weight) which will:
- update the actions weight OR
- insert the action with that weight if it does not exist.
To implement this function GER needed to Upsert the action record. Given that upsert is not available in Postgres, I found two possible workarounds:
- implement it with multiple queries to Postgres in the application
- combine the queries into a single statement to be sent to Postgres
The Multi-Query Approach
The most straight forward approach to implementing upsert is directly in the application by calling Postgres multiple times.
To implement this the function set_action_weight(action, weight) first checks to see whether the action already exists using count:
@knex.select('*').from('actions') .where(action: action).count() .then( (count) => ...
If the action does not exist (i.e. the count is 0) then insert the record, otherwise update it:
if count == 0 @knex('actions') .insert(({action: action, weight: weight}) else @knex('actions').where(action: action) .update({weight: weight})
This code can introduce a race condition where if two actions are added at the same time it can cause both to try insert the record. Given actions must be unique in GER, this would cause a unique key violation error (whose code is 23505) that must be handled:
.catch( (error) -> if error.code != '23505' throw error )
The main problems with this method are:
- Although the code is reasonably straight forward, it requires ugly exception handling
- Each time this method is called, it will call the database multiple times, potentially impacting performance
The Single-Statement Approach
Upsert can also be implemented in a single Postgres statement, as described here.
Knex can be used to build query strings with the toString function, whose output can combined into a single statement.
First the insert statement is built:
insert = @knex("actions") .insert({action: action, weight: weight}) .toString()
To workaround this bug the values must be replaced with select:
insert.replace(/\svalues\s\(/," select ")[..-2]
The update query can then be built:
update = @knex("actions") .where(action: action) .update({weight: weight}) .toString()
The single statement can then be constructed to first lock the table, removing the possibility for a race condition. Then some fancy SQL can be used to first try update the record, and if no columns are are updated it will then try insert:
query = "BEGIN; LOCK TABLE actions IN SHARE ROW EXCLUSIVE MODE; WITH upsert AS (#{update} RETURNING *) #{insert} WHERE NOT EXISTS (SELECT * FROM upsert); COMMIT;"
Knex can then send this statement with:
@knex.raw(query)
This single-statement method for upsert executes entirely inside Postgres, and removes the possibility of a race condition. However, it adds some reasonably complex SQL that is Postgres specific and might be difficult to maintain.
Comparison of Methods
To properly compare these methods, the actual gain in performance should be measured. This is because optimisation without metrics is not optimisation. So, I ran this test against them:
ger = new GER() start_time = new Date().getTime() promises = [] for x in [1..1000] promises.push ger.set_action_weight('buy', 1)``q.all(promises) .then(-> end_time = new Date().getTime() time = end_time - start_time per_time = time/1000 console.log "#{per_time}ms" )
This test executes set_action_weight 1000 times concurrently. Waits for them all to complete and calculates the average time.
For the multi-query method it took on average 1.16ms per call, and for the single statement it took 0.99ms.
This shows that the single-statement method is about 10% faster than multi-query.
Conclusion
I understand that the comparison experiment is not a realistic test of the two methods, but it does show that the single-statement is faster by more than just a little bit. This is why the single-statement method is used in GER, which will hopefully help keep it performant as its use scales.
Upsert is a common pattern that can be used across applications. So it is a handy tool to wield, especially if performance is a requirement.
References/Thanks
The Good Enough Recommendations (GER) engine.
Thanks Autaux for the image.
The Art of Web’s SQL: A basic UPSERT in PostgreSQL. This is an excellent article about upsert in Postgres.
Seven Databases in Seven Weeks: I should probably read this book.