# Dumping and restoring PostgreSQL databases: a personal cheat sheet

I occasionally need to dump and restore PostgreSQL databases: infrequently enough that I tend to forget about all the useful CLI switches to the `pg_dump` and `pg_restore` commands, yet frequently enough that having to look up the manual page for them, and trying to remember which parameters and flags I need, is mildly annoying.

Sometimes I need to rebuild a PostgreSQL container with a newer major version, moving data over, or to copy data from a live system to a development instance to triage a bug, or to create a snapshot of a development instance before testing some tricky migrations... In these cases, dumping and restoring a db quickly can make the *difference between a fearless little task and unnecessary stress* scrambling to remember the right CLI incantation and to avoid accidental mistakes.

Likewise, *most of my dump and restore needs are modest*: tens of tables, usually just one schema, a few extensions, functions and triggers, and overall at most tens of GiB of data. Reaching out for anything more complex than the lovely CLI commands included in the PostgreSQL distribution would likely be overkill.

So, here's a tiny cheat sheet *for my typical use cases*, with some notes on what I find useful.

## Dumping databases

### Full dump

```bash
pg_dump -h localhost -p 5432 -U user -F t database > project_environment_$(date +%FT%T)_full.tar
```

Here I am creating a *dump in tar format*, uncompressed (`-F t`). If I need to move the dump to a different machine, especially over a slow connection, I may compress the file on the fly with `xz` or whatever is available.

I tend to always specify parameters such as `-h` and `-p` that may have defaults, *to make sure I am not accidentally dumping a wrong database* among the many that I may have running on my laptop at any time.

I use a *simple naming convention for the destination file*, including a project name/label, an environment label (this may be referencing a code branch, for example), a timestamp, and some suffix that can tell me more about the data in the dump if I look at the file after some time.

### Full dump, only data (skipping migrations)

```bash
pg_dump -h localhost -p 5432 -U user -T 'migrations' -a -F t database > project_environment_$(date +%FT%T)_full_data-only.tar
```

Here I am creating a full dump as in the first example, but *only including data* (`-a`), *without any schemas* (data definitions), and *skipping the* `migrations` *table*, which in this case holds database migrations metadata. This is often useful *when importing data into a fresh new database whose structure has already been set up* (for example, a new application instance where the database schema is set up via migrations when the application starts - hence the skipping of the `migrations` table via `-T 'migrations'`).

### Skipping tables

```bash
pg_dump -h localhost -p 5432 -U user -T 'prefixA*' -T 'prefixB*' -T 'table_to_skip' -F t database > project_environment_$(date +%FT%T)_no-prefixA_no-prefixB_no-table_to_skip_no-migrations.tar
```

Here I am creating a dump but *skipping some tables*: all the tables whose name starts with `prefixA` or `prefixB` (`-T 'prefixA*' -T 'prefixB*'`, using the `*` wildcard character, and quoting the table names to avoid accidental shell expansion of the wildcard) as well as the table `table_to_skip`.

Often *databases may be seeded with stock data* (for example, some spatial data imported through an ETL pipeline from external sources), and if I move data to a fresh new database, this one may already contain all the spatial data we need, so there is no need to move this over through a dump+restore.

### Only some schemas

```bash
pg_dump -h localhost -p 5432 -U user -T 'prefixA*' -T 'prefixB*' -T 'table_to_skip' -T migrations -n public -F t database > project_environment_$(date +%FT%T)_public-schema-only_no-prefixA_no-prefixB_no-table_to_skip.tar
```

Same as above, but this time *only dumping objects from the* `public` *schema* (`-n public`).

Restricting the dump to one or more schemas may be useful, for example, when *we don't want to touch objects in schemas that are managed by a SaaS cloud database product* that uses specific schemas to hold state or product metadata.

## Restoring databases

```bash
pg_restore -h localhost -p 5432 -U user -d database -c --if-exists < project_environment_$(date +%FT%T)_full.tar 2>&1 | tee project_environment__restore_$(date +%FT%T).log
```

Restoring is usually quite standard in my typical use cases.

Most of the time I want to make sure that any *existing objects are first dropped before trying to create them*: in this case `-c --if-exists` will do exactly this, but without complaining if the object is not there when `pg_dump` tries to drop it (maybe I'm restoring to a fresh new, clean database).

Since warnings and errors can scroll past quickly, especially when restoring large dumps, I use `tee` to *capture stdout and stderr into a log file* which I can then check carefully for anything that needs attention.

And that's it - this is mostly what I need to remember how to do, every now and then. Looking up previous uses of `pg_dump` and `pg_restore` in the shell history may help, but often I dump and restore across several different machines, most of which are short-lived, so shell history can only help to some extent. A quick cheat sheet with the meaning of the key flags and parameters should hopefully help to remind myself why I did a dump or a restore in a specific way a few months back.

*A big thank you goes to my colleague Alicia, whose* `pg_dump` *and* `pg_restore` *incantations in shell scripts and makefiles across Vizzuality's repositories have been my* ***actual*** *reference for examples so far.*
