Revisiting “Updating a Single Column with the Results of A Query in PostgreSQL”

Alana Thomas
3 min readMar 5, 2021

I wrote the eponymous article back in October 2019, and my technique has gotten more elegant since then, so here’s an update:

If I need to add a new column to an existing table and fill it with data from another table (and this is a frequent-enough ask at my company that I have developed a strategy for it), it is:

  1. faster to target that column specifically to load instead of loading the entire table, especially when the table has many fields and a lot of rows.
  2. easier on the database if you first load the data you need into a temporary table and then simply copy that table over to the real table.

In a timing exercise, re-loading an entire table took 45 seconds, while this approach shaved that time down to 14 seconds.

Sample Code for Target Column Load

UPDATE my_table
SET new_column = tt.source_value
FROM (
SELECT source_id,
value as source_value
FROM source_table
WHERE my_criteria = true )
AS tt
WHERE my_table.id = tt.source_id;

First, create the query that fetches the data you want to fill into the new column; this will be your inner select. In this example, it is the lines:

SELECT source_id,
value as source_value
FROM source_table
WHERE my_criteria = true

value (aliased to source_value) is the data we want to populate our target table with. You also need to fetch whatever data you’re using to match up the rows between the source data and the targeted table; this is often the primary key — in this case, source_id.

Wrap this query in parens and alias it, using AS. This allows you to reference the data outside of the inner query itself — see the line

SET new_column = tt.source_value

You can add JOINs/WHEREs/etc. in the inner or outer query as needed — adding more conditions can improve performance as well. For instance, in a test, formulating my inner query as:

SELECT source_id,
value as source_value
FROM source_table
WHERE value IS NOT NULL

meaning that the database only grabbed rows where there was data to update, shaved a further three seconds off execution time (14 seconds to 11 seconds).

The problem remains, however, that even though we’ve shaved quite a bit off of the execution time of reloading the entire table, we’re still locking up our target table for eleven seconds, downtime which is likely unacceptable, especially in a user-facing application. Our next improvement utilizes a temporary table to minimize table locks:

Sample Code for Target Column Load with Temporary Table

CREATE TABLE temp_table (
source_id,
source_value
);
INSERT INTO temp_table (source_id, source_value)
SELECT source_id,
value as source_value
FROM source_table
WHERE my_criteria = true;
UPDATE my_table
SET new_column = tt.source_value
FROM (
SELECT * from temp_table
) AS tt
WHERE my_table.id = tt.source_id;
DROP TABLE temp_table;

The most time-consuming processing (that of grabbing the source values) is now locking only the temporary table we created for this update — the actual update of the real table locks it for a much smaller amount of time.

When I took timings for a large query, I found that the time that the temporary table was locked during the heavy-duty processing was 2 minutes and 44 seconds. Meanwhile, the real table was locked during that update for less than a second (0.82 seconds).

--

--

Alana Thomas

Developer 👩‍💻 & tabletop gamer 🎲. I mainly post character backstories and Postgres tips.