Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
September 21, 2022 09:30 pm GMT

How to Migrate from MySQL to PostgreSQL RDBMS: An Enterprise Approach

The article was initially published in JFrog blog.

1. Introduction to Migrating MySQL to PostgreSQL.

As part of JFrogs SaaS strategy to support only the PostgreSQL database (DB), JFrog has faced a need to migrate existing customers databases in our SaaS from MySQL to PostgreSQL.

It was essential to develop a fully automated, reliable, and stable migration process with zero data loss and minimal downtime, capable of migrating DBs of tens/hundreds of GBs, and guarantee that the existing applications can work transparently with the migrated DBs.

We developed a MySQL2PG process based on the pgloader open-source utility [https://github.com/dimitri/pgloader] to address this demand.

2. What are the components/topology to migrate DBs from MySQL to PostgreSQL?

How to Migrate from MySQL to PostgreSQL RDBMS: An Enterprise Approach

The basic topology of the DB migration process includes the Source database (MySQL), the Destination database (PostgreSQL), and the Migration machine. The pgloader runs from the Migration machine, reading data from the Source database (MySQL) and writing it to the Destination database (PostgreSQL).

The Migration machine should have direct, reliable, and high throughput connectivity to the source and destination databases. The typical configuration of the Migration machine includes 16 CPUs and 64 GB RAM, network bandwidth of at least 10 Gbps, and storage of at least 500GB.

To guarantee compatibility of DB objects, their structure, and correspondence to specific data types, the DB migration process runs first DDL scripts suitable to a particular type/version of a product. It then copies data into the prepared in advance target DB schema.

The pgloader isnt exporting data into a dump, placing the dump on the Migration machine, or occupying storage space by DBs data. It creates a kind of pipeline process in the Migration machine RAM and reads data selecting it from the Source DB and copying it into the Destination DB.

3. What would be a typical OS for the Migration machine?

OS: Ubuntu 20.04

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.2 LTS
Release: 20.04
Codename: focal

4. What is recommended to install on the Migration machine?

sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -sudo apt-get updatesudo apt-get -y install postgresql-13sudo apt-get install -y pgloadersudo apt-get install mysql-client

5. How to run pgloader to migrate a DB from MySQL to PostgreSQL?

Create pgloader configuration file:

cat pgloader.confLOAD DATABASEFROM mysql://src_db_user:src_db_pwd@src_db_host/src_db_name?sslmode=<...>INTO postgresql://dest_db_user:dest_db_pwd@dest_db_host/dest_db_name;

Run pgloader:

pgloader pgloader.conf

6. How to run MySQL2PG pgloader as a background process?

To avoid possible interruption of the MySQL2PG migration process its recommended to run pgloader as a background process. The procedure includes the following steps.

Create pgloader configuration file:

cat pgloader.confLOAD DATABASEFROM mysql://src_db_user:src_db_pwd@src_db_host/src_db_name?sslmode=<...>INTO postgresql://dest_db_user:dest_db_pwd@dest_db_host/dest_db_name;

Create a shell script to run pgloader as a background process:

cat run_pgloader_background.shnohup pgloader pgloader.conf 2>&1 &

Run the script:

run_pgloader_background.sh

The run trace will be placed on file:

nohup.out

NOTES:

Usually

src_db_user = dest_db_user,src_db_pwd = dest_db_pwd,src_db_name = dest_db_name

7. What does a typical output trace of pgloader look like?

2022-01-08T20:47:55.046000+02:00 LOG report summary reset                              table name     errors       read   imported      bytes      total time       read      write----------------------------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------                         fetch meta data          0        278        278                     0.808s                              Create Schemas          0          0          0                     0.018s                            Create SQL Types          0          0          0                     0.010s                               Create tables          0        144        144                     1.137s                              Set Table OIDs          0         72         72                     0.011s    ----------------------------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------                 my_db.my_table_number_1          0  319387354  319387354    38.0 GB      37m40.265s  37m40.242s  28m41.397s                 my_db.my_table_number_2          0   15748659   15748659     1.9 GB       1m50.060s  1m50.039s  1m22.053s                 my_db.my_table_number_3          0    3989089    3989089   336.5 MB         20.059s    20.009s    14.337s        (etc)----------------------------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------                 COPY Threads Completion          0          4          4                 45m10.569s                              Create Indexes          0        165        165                1h2m48.214s                      Index Build Completion          0        165        165                  6m12.345s                             Reset Sequences          0          1          1                     0.123s                                Primary Keys          0         69         69                     0.789s                         Create Foreign Keys          0         41         41                  1m23.456s                             Create Triggers          0          0          0                     0.005s                             Set Search Path          0          1          1                     0.012s                            Install Comments          0          0          0                     0.000s    ----------------------------------------  ---------  ---------  ---------  ---------  --------------  ---------  ---------                       Total import time            363230932  363230932   123.4 GB    1h23m45.678s

8. How to ensure migration has finished successfully? How do I ensure zero data loss?

Verify trace of the pgloader: it should have zero errors, and the read and imported values should be matched.

9. How do I exclude tables from the pgloader process?

In case the source DB contains tables that can be safely excluded from the migration process, this can be done via the EXCLUDING TABLE NAMES MATCHING configuration of pgloader.

Example:

LOAD DATABASEFROM mysql://src_db_user:src_db_pwd@src_db_host/src_db_name?sslmode=<...>INTO postgresql://dest_db_user:dest_db_pwd@dest_db_host/dest_db_nameEXCLUDING TABLE NAMES MATCHING 'table_to_exclude_one','table_to_exclude_two','table_to_exclude_three';

10. The Heap exhausted, game over issue.

The Heap exhausted, game over message from the pgloader may indicate that the pgloader default dedicated 4GB RAM is not enough for a given DB migration run.

The solution in such a case would be to use a pgloader compiled with a more extensive DYNSIZE parameter. The DYNSIZE parameter allows modifying the default amount of memory the pgloader image will allow itself to use when running through data.

11. How to compile the pgloader to use more RAM?

The pgloader utility is not using all available RAM on the host machine. The amount of RAM the pgloader can use is defined in the compilation stage. To dedicate more RAM a customized version of pgloader should be compiled. The following step-by-step guide illustrates how to compile pgloader with customized parameter DYNSIZE. The DYNSIZE parameter allows modification of the pgloader image when running through data.

pgloader compilation--------------------Build pgloader from sources(1) Check OS:dima@dima-VirtualBox:~$ lsb_release  -aNo LSB modules are available.Distributor ID: UbuntuDescription:    Ubuntu 20.04.3 LTSRelease:        20.04Codename:       focaldima@dima-VirtualBox:~$(2) Install git:sudo apt updatesudo apt install git(3) Clone pgloader:mkdir my_pgloadercd my_pgloader/git clone https://github.com/dimitri/pgloader.git(4) install packages necessary for build:sudo apt-get install sbcl unzip libsqlite3-dev make curl gawk freetds-dev libzip-dev(5) make build: make DYNSIZE=10240 pgloader(6) make outputs a ./build/bin/pgloader file for us to use.dima@dima-VirtualBox:~/my_pgloader/pgloader/build/bin$ pwd/home/dima/my_pgloader/pgloader/build/bindima@dima-VirtualBox:~/my_pgloader/pgloader/build/bin$ ls -rtoglatotal 69160-rw-rw-r-- 1       70 Jan  7 17:17 .gitignoredrwxrwxr-x 5     4096 Jan  7 17:29 ..-rwxr-xr-x 1 41918800 Jan  7 17:29 buildapp.sbcl-rwxr-xr-x 1 29036008 Jan  7 17:29 pgloaderdrwxrwxr-x 2     4096 Jan  7 17:29 .dima@dima-VirtualBox:~/my_pgloader/pgloader/build/bin$cp pgloader pgloader_dima_dynsize_10240dima@dima-VirtualBox:~/my_pgloader/pgloader/build/bin$ ls -rtoglatotal 97396-rw-rw-r-- 1       70 Jan  7 17:17 .gitignoredrwxrwxr-x 5     4096 Jan  7 17:29 ..-rwxr-xr-x 1 41918800 Jan  7 17:29 buildapp.sbcl-rwxr-xr-x 1 29036008 Jan  7 17:29 pgloader-rwxr-xr-x 1 29036008 Jan  7 17:33 pgloader_dima_dynsize_10240drwxrwxr-x 2     4096 Jan  7 17:33 .dima@dima-VirtualBox:~/my_pgloader/pgloader/build/bin$dima@dima-VirtualBox:~/my_pgloader/pgloader/build/bin$ ./pgloader_dima_dynsize_10240 --versionpgloader version "3.6.a94a0a3"compiled with SBCL 2.0.1.debiandima@dima-VirtualBox:~/my_pgloader/pgloader/build/bin$

12. How to manage the Connection reset by peer pgloader issue?

The Connection reset by peer error message during the pgloader run usually indicates timeouts of the pgloader connecting to MySQL [ the Source DB ].

To manage this issue, we recommend adding the following parameters to the pgloader configuration script:

SET MySQL PARAMETERSnet_read_timeout = '5000',net_write_timeout = '5000'

13. How to improve the duration of the pgloader migration process?

To improve the duration of the pgloader migration process there are a few approaches:

Scale up source and destination databases
Ensure no-load/heavy activity / enough storage, RAM, connections, and resources both on the source and on the destination databases
Use pgloader compiled with customized parameter DYNSIZE
Scale up the Migration machine to allow more RAM
Run pgloader with the following parameters:

SET PostgreSQL PARAMETERSmaintenance_work_mem to '512MB',work_mem to '48MB'

14. How to configure pgloader to run stable and reliably on big databases / on big amounts of data?

Use the pgloader compiled with customized DYNSIZE parameter. It will allow dedicating more amount of RAM to the pgloader process
Use the following pgloader configuration:

cat pgloader.confFROM LOAD DATABASEFROM mysql://db_user:db_pwd@src_db_host/db_name?sslmode=<...>INTO postgresql://db_user:db_pwd@dest_db_host/db_nameWITHdata only, create no indexes,workers = 8, concurrency = 1,multiple readers per thread, rows per range = 10000,batch rows = 10000SET PostgreSQL PARAMETERSmaintenance_work_mem to '512MB',work_mem to '48MB'SET MySQL PARAMETERSnet_read_timeout = '5000',net_write_timeout = '5000'EXCLUDING TABLE NAMES MATCHING 'table_to_exclude_one','table_to_exclude_two','table_to_exclude_three'ALTER SCHEMA 'db_schema_name' RENAME TO 'public';

15. What will a minimal pgloader configuration look like?

FROM LOAD DATABASEFROM mysql://db_user:db_pwd@src_db_host/db_name?sslmode=<...>INTO postgresql://db_user:db_pwd@dest_db_host/db_namewithbatch size = 2048 kB,batch rows = 2000,prefetch rows = 2000SET PostgreSQL PARAMETERSmaintenance_work_mem to '512MB',work_mem to '48MB'SET MySQL PARAMETERSnet_read_timeout = '5000',net_write_timeout = '5000'EXCLUDING TABLE NAMES MATCHING 'table_to_exclude_one','table_to_exclude_two','table_to_exclude_three'ALTER SCHEMA 'db_schema_name' RENAME TO 'public';

16. Why is it important to monitor the source, destination databases, and Migration machine before, during, and after the process?

To guarantee the stable, reliable, and optimal work of the pgloader its vital to monitor the source and the destination databases before, during, and after migration. The DBs machines should be strong enough, and they need to have enough resources to pgloader intensive activity. Both source and destination DBs machines shouldnt be loaded by heavy activity or intensive CPU operations or high Read or(and) Writes.

Note: any load or heavy activity on the Source or the Destination DB machine directly impacts the DB migrations performance and success.

In addition, its important to monitor the CPU, IOPs, memory, and network of the Migration machine during the process.

17. Importance of the split to the Source DB, the Destination DB, and the Migration machines.

Its important to keep the Source DB Instance, the Destination DB Instance, and the Migration machine separately. Combining these components will impact DB migration performance and stability.

18. Can it be queried the destination PostgreSQL DB during the migration process? Will we see the progress of DB migration by querying the copied tables on the fly?

Although we can see live sessions on the Destination PostgreSQL DB during the migration, the actual data is committed only at the very end of the migration.

Querying in the middle of pgloader run any populated tables is useless: their data placed on dirty, not committed yet blocks. Only once pgloader has finished can we see the actual records on target DB querying select * from .


Original Link: https://dev.to/dm8ry/how-to-migrate-from-mysql-to-postgresql-rdbms-an-enterprise-approach-2hfj

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To