Update 2016-01-26: lightweight database migrations using PostgreSQL 2 details an update to the ideas outlined in this post, fixing various issues and streamlining the overall approach.
How can I run database migrations in a simple manner, with minimal dependencies, and using the database itself rather than a separate utility? To explore this, I use PostgreSQL. Not wanting to use anything particularly exclusive, I opt for PostgreSQL functions using SQL rather than PL/pgSQL. The latter would likely reduce duplication, but I don’t mind a little duplication so long as it’s centralised and gives me something in return. I’m also quite happy with using common utilities like cat
and psql
.
creating migrations
stage 0: initialisation
We need a way of storing metadata for migrations. We could use a dedicated table, but I opt for a more generic _meta
table in case I want to store other metadata directly in the database in a similar format.
-- database/0_init.sql
CREATE TABLE IF NOT EXISTS _meta (
key VARCHAR PRIMARY KEY,
value VARCHAR
);
INSERT INTO _meta (key, value) VALUES ('version', 0);
We could run this separately once, or just include it along with the migrations, if we’re not worried about the INSERT
failure on subsequent runs. (If you’re running with psql -1
, this would be a problem, but I’m specifying transactions explicitly.)
We create some temporary functions, which our migrations will be able to use:
-- database/0_init.sql
CREATE FUNCTION pg_temp.version() RETURNS integer AS $$
SELECT value::integer FROM _meta WHERE key = 'version';
$$ LANGUAGE sql;
CREATE FUNCTION pg_temp.version(integer) RETURNS boolean AS $$
UPDATE _meta SET value = $1 WHERE key = 'version';
SELECT TRUE;
$$ LANGUAGE sql;
CREATE FUNCTION pg_temp.migrate(integer) RETURNS boolean AS $$
SELECT CASE WHEN pg_temp.version() = $1 - 1 THEN NULL ELSE FALSE END;
$$ LANGUAGE sql;
The pg_temp.version()
function gets the current migration version. We need to cast the type, because of the generic _meta
table.
The pg_temp.version(integer)
function sets a migration version. It returns boolean
to be compatible with how we’ll run the migrations, and will indicate that the migration actually executed.
The pg_temp.migrate(integer)
function returns whether a particular migration should be run. It should only be run if it is strictly the next migration in the sequence. This deals with excluding migrations which have previously been run, and whilst protecting against running migrations in the wrong order accidentally.
stage 1: migrations
We create an example migration:
-- database/1_migrate_1.sql
CREATE FUNCTION pg_temp.migrate_1() RETURNS boolean AS $$
INSERT INTO _meta (key, value) VALUES ('tmp-1', 1);
SELECT pg_temp.version(1);
$$ LANGUAGE sql;
This is the format that each of our migrations should be in: in its own file if you prefer; numbered consecutively starting from 1
; wrapped in a temporary function named with the migration number suffix; returning boolean
, with the final statement a SELECT
calling the migration version setter. The INSERT INTO
statement is merely an example of migration work, here.
stage 2: execution
We need a way of calling the migration functions:
-- database/2_exec.sql
BEGIN;
SELECT
pg_temp.version() "version_f",
COALESCE(pg_temp.migrate(1), pg_temp.migrate_1()) "migrate_1",
pg_temp.version() "version_t"
;
COMMIT;
We wrap this in a transaction, and chain the calls in the correct order. With PostgreSQL, even DDL statements will get rolled back if the transaction fails. The calls to the pg_temp.version()
function report the starting and ending migration versions. The COALESCE()
handles only running a migration if it should be run next and hasn’t been run already, using lazy-evaluation of parameters.
We make a small script to ensure we don’t forget which order to call the files in, returning the entire SQL code to be executed:
# database.sql.sh
#!/bin/sh
cat \
database/0_init.sql \
database/1_migrate_*.sql \
database/2_exec.sql
This uses cat
to concatenate the files in correct order of stages: 0
, 1
, 2
. It doesn’t matter which order the individual migrations in stage 1
are concatenated, as they only define functions; it’s only in stage 2
they’re conditionally executed.
running migrations
Finally, we run the migrations against the database, using psql -x
to transpose the output to list one line per migration:
database.sql.sh | psql -x
CREATE TABLE
INSERT 0 1
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
BEGIN
-[ RECORD 1 ]
version_f | 0
migrate_1 | t
version_t | 1
COMMIT
The CREATE TABLE
and INSERT
lines indicate that the _meta
table was initialised. The CREATE FUNCTION
lines indicate that our migrations have been loaded. The RECORD 1
shows that the migration version started at 0
, ran migrate_1
, and ended at 1
.
If we run the migrations again:
database.sql.sh | psql -x
CREATE TABLE
ERROR: duplicate key value violates unique constraint "_meta_pkey"
DETAIL: Key (key)=(version) already exists.
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
CREATE FUNCTION
BEGIN
-[ RECORD 1 ]
version_f | 1
migrate_1 | f
version_t | 1
COMMIT
The _meta
table initialisation gives an error, but we can ignore that. The RECORD 1
shows that the migration version started at 1
, didn’t run migrate_1
, and ended at 1
.
concluding thoughts
I like the simplicity of this approach, with SQL files able to be used without being wrapped in another language, and requiring only cat
and psql
as dependencies. I don’t like the repetition in database/2_exec.sql
, although it does have its advantages in being able to define a custom name for each migration to display in the final report. Alternatively, a more complex function using PL/pgSQL could likely be used, wrapping the calls in a loop. The use of side-affects when calling using SELECT
is rather hacky; perhaps I’d prefer it split into multiple statements. That would also make me more comfortable about evaluation-order guarantees of parameters; COALESCE
provides the necessary assurances, but I’m not sure about SELECT
. We could wrap a single transaction around the outside of everything using psql -1
, but I’d be worried that I’d forget to type that one day and run migrations outside of a transaction by mistake.
The code in this post is also available in my coding notebook.