Dumping and restoring PostgreSQL databases: a personal cheat sheet

I occasionally need to dump and restore PostgreSQL databases: infrequently enough that I tend to forget about all the useful CLI switches to the pg_dump and pg_restore commands, yet frequently enough that having to look up the manual page for them, and trying to remember which parameters and flags I need, is mildly annoying.

Sometimes I need to rebuild a PostgreSQL container with a newer major version, moving data over, or to copy data from a live system to a development instance to triage a bug, or to create a snapshot of a development instance before testing some tricky migrations... In these cases, dumping and restoring a db quickly can make the difference between a fearless little task and unnecessary stress scrambling to remember the right CLI incantation and to avoid accidental mistakes.

Likewise, most of my dump and restore needs are modest: tens of tables, usually just one schema, a few extensions, functions and triggers, and overall at most tens of GiB of data. Reaching out for anything more complex than the lovely CLI commands included in the PostgreSQL distribution would likely be overkill.

So, here's a tiny cheat sheet for my typical use cases, with some notes on what I find useful.

Dumping databases

Full dump

pg_dump -h localhost -p 5432 -U user -F t database > project_environment_$(date +%FT%T)_full.tar

Here I am creating a dump in tar format, uncompressed (-F t). If I need to move the dump to a different machine, especially over a slow connection, I may compress the file on the fly with xz or whatever is available.

I tend to always specify parameters such as -h and -p that may have defaults, to make sure I am not accidentally dumping a wrong database among the many that I may have running on my laptop at any time.

I use a simple naming convention for the destination file, including a project name/label, an environment label (this may be referencing a code branch, for example), a timestamp, and some suffix that can tell me more about the data in the dump if I look at the file after some time.

Full dump, only data (skipping migrations)

pg_dump -h localhost -p 5432 -U user -T 'migrations' -a -F t database > project_environment_$(date +%FT%T)_full_data-only.tar

Here I am creating a full dump as in the first example, but only including data (-a), without any schemas (data definitions), and skipping the migrations table, which in this case holds database migrations metadata. This is often useful when importing data into a fresh new database whose structure has already been set up (for example, a new application instance where the database schema is set up via migrations when the application starts - hence the skipping of the migrations table via -T 'migrations').

Skipping tables

pg_dump -h localhost -p 5432 -U user -T 'prefixA*' -T 'prefixB*' -T 'table_to_skip' -F t database > project_environment_$(date +%FT%T)_no-prefixA_no-prefixB_no-table_to_skip_no-migrations.tar

Here I am creating a dump but skipping some tables: all the tables whose name starts with prefixA or prefixB (-T 'prefixA*' -T 'prefixB*', using the * wildcard character, and quoting the table names to avoid accidental shell expansion of the wildcard) as well as the table table_to_skip.

Often databases may be seeded with stock data (for example, some spatial data imported through an ETL pipeline from external sources), and if I move data to a fresh new database, this one may already contain all the spatial data we need, so there is no need to move this over through a dump+restore.

Only some schemas

pg_dump -h localhost -p 5432 -U user -T 'prefixA*' -T 'prefixB*' -T 'table_to_skip' -T migrations -n public -F t database > project_environment_$(date +%FT%T)_public-schema-only_no-prefixA_no-prefixB_no-table_to_skip.tar

Same as above, but this time only dumping objects from the public schema (-n public).

Restricting the dump to one or more schemas may be useful, for example, when we don't want to touch objects in schemas that are managed by a SaaS cloud database product that uses specific schemas to hold state or product metadata.

Restoring databases

pg_restore -h localhost -p 5432 -U user -d database -c --if-exists < project_environment_$(date +%FT%T)_full.tar 2>&1 | tee project_environment__restore_$(date +%FT%T).log

Restoring is usually quite standard in my typical use cases.

Most of the time I want to make sure that any existing objects are first dropped before trying to create them: in this case -c --if-exists will do exactly this, but without complaining if the object is not there when pg_dump tries to drop it (maybe I'm restoring to a fresh new, clean database).

Since warnings and errors can scroll past quickly, especially when restoring large dumps, I use tee to capture stdout and stderr into a log file which I can then check carefully for anything that needs attention.

And that's it - this is mostly what I need to remember how to do, every now and then. Looking up previous uses of pg_dump and pg_restore in the shell history may help, but often I dump and restore across several different machines, most of which are short-lived, so shell history can only help to some extent. A quick cheat sheet with the meaning of the key flags and parameters should hopefully help to remind myself why I did a dump or a restore in a specific way a few months back.

A big thank you goes to my colleague Alicia, whose pg_dump and pg_restore incantations in shell scripts and makefiles across Vizzuality's repositories have been my actual reference for examples so far.

fearless engineering