Migrate to Managed Postgres using pg_dump and pg_restore
This guide provides step-by-step instructions on how to migrate your PostgreSQL database to ClickHouse Managed Postgres using the pg_dump and pg_restore utilities.
Prerequisites
- Access to your source PostgreSQL database.
pg_dumpandpg_restoreinstalled on your local machine. These are typically included with PostgreSQL installations. If not, you can download them from the PostgreSQL official website.
The setup
To go through the steps, let's use a sample RDS Postgres database as the source database. Something like this:
Here's what we're working with:
- Two tables -
eventsandusers.eventshas a million rows, andusershas a thousand rows. eventshas an index.- A view on top of the
eventstable. - Couple of sequences
Create a dump of the source database
Now let's use pg_dump to create a dump file of the above objects. It's a simple command:
Here:
- Replace
<user>,<password>,<host>,<port>, and<database>with your source database credentials. Most Postgres providers give you a connection string that you can use directly. --format directoryspecifies that we want the dump in a directory format, which is suitable forpg_restore.-f rds-dumpspecifies the output directory for the dump files. Note that this directory will be created automatically and should not exist beforehand.- You can also parallelize the dump process by adding the
--jobsflag followed by the number of parallel jobs you want to run. For more details, refer to the pg_dump documentation.
You can test this process once to get a sense of how long it takes and the size of the dump file.
Here's what running this command looks like:
Migrate the dump to ClickHouse Managed Postgres
Now that we have the dump file, we can restore it to our ClickHouse Managed Postgres instance using pg_restore.
Create a Managed Postgres instance
First, ensure you have a Managed Postgres instance set up, preferably in the same region as the source. You can follow the quick guide here. Here's what we are going to spin up for this guide:
Restore the dump
Now, heading back to our local machine, we can use the pg_restore command to restore the dump to our Managed Postgres instance:
You can get the connection string for your Managed Postgres instance from the ClickHouse Cloud console, explained very simply here.
Here too there are a couple of flags to note:
--verboseprovides detailed output during the restore process.- You can also use the
--jobsflag here to parallelize the restore process. For more details, refer to the pg_restore documentation.
In our case, it looks like this:
Verify the migration
Once the restore process is complete, you can connect to your Managed Postgres instance and verify that all your data and objects have been migrated successfully. You can use any PostgreSQL client to connect and run queries. Here's what our Managed Postgres setup looks like after the migration:
We see that we have all our tables, indexes, views, and sequences intact, along with the data counts matching.
Considerations
- Ensure that the PostgreSQL versions of the source and target databases are compatible. Using a pg_dump version older than the source server may lead to missing features or restore issues. Ideally, use the same or newer major version of pg_dump than the source database.
- Large databases may take a significant amount of time to dump and restore. Plan accordingly to minimize downtime, and consider using parallel dumps/restores (--jobs) where supported.
- Note that pg_dump / pg_restore do not replicate all database-related objects or runtime state. These include roles and role memberships, replication slots, server-level configuration (e.g. postgresql.conf, pg_hba.conf), tablespaces, and runtime statistics.
Next steps
Congratulations! You have successfully migrated your PostgreSQL database to ClickHouse Managed Postgres using pg_dump and pg_restore. You are now all set to explore Managed Postgres features and its integration with ClickHouse. Here's a 10 minute quickstart to get you going: