Cross Replication Between PostgreSQL and MySQL







I will outline the cross-replication between PostgreSQL and MySQL, as well as the methods for setting up cross-replication between these two database servers. Typically, cross-replication databases are called homogeneous, and this is a convenient method for moving from one relational database server to another.







PostgreSQL and MySQL databases are considered relational, but with additional extensions they offer NoSQL features. Here we will discuss replication between PostgreSQL and MySQL, in terms of relational DBMS.







We will not describe the entire inner kitchen, only the basic principles, so that you get an idea of ​​how to configure replication between database servers, the advantages, limitations and usage scenarios.







Typically, replication between two identical database servers is performed either in binary mode or through requests between the master node (it is the publisher, master or active) and the slave (subscriber, waiting or passive). The purpose of replication is to provide a real-time copy of the master database on the slave side. In this case, data is transferred from the master to the slave, that is, from active to passive, because replication is performed only in one direction. But you can configure replication between two databases in both directions, so that data is transferred from the slave to the master in the active-active configuration. All this, including cascading replication, possibly between two or more identical database servers. The active-active or active-passive configuration depends on the need, the availability of such features in the initial configuration, or the use of external configuration solutions and existing trade-offs. .







The described configuration is possible between different database servers. The server can be configured to receive replicated data from another database server and still save real-time snapshots of the replicated data. MySQL and PostgreSQL offer most of these configurations on their own or with third-party extensions, including binary log methods, disk locks, and statement and string based methods.







Cross-replication between MySQL and PostgreSQL is needed for a single migration from one database server to another. These databases use different protocols, so they cannot be connected directly. To establish data exchange, you can use an external open source tool, for example pg_chameleon.







What is pg_chameleon



pg_chameleon is a replication system from MySQL to PostgreSQL in Python 3. It uses the open source library mysql-replication, also in Python. String images are extracted from MySQL tables and stored as JSONB objects in the PostgreSQL database, and then decrypted by the pl / pgsql function and played back in the PostgreSQL database.







Pg_chameleon features



Multiple MySQL schemas from the same cluster can be replicated to the same PostgreSQL target database with a one-to-many configuration

The names of the source and target schemas cannot match.

Replication data can be retrieved from a cascaded MySQL replica.

Tables that cannot replicate or generate errors are excluded.

Each replication function is controlled by daemons.

Control using parameters and configuration files based on YAML.







Example



Host vm1 vm2
OS version CentOS Linux 7.6 x86_64 CentOS Linux 7.5 x86_64
DB server version MySQL 5.7.26 PostgreSQL 10.5
DB Port 3306 5433
IP address 192.168.56.102 192.168.56.106


To get started, prepare all the necessary components for installing pg_chameleon. In this example, Python 3.6.8 is installed, which creates a virtual environment and activates it.







$> wget https://www.python.org/ftp/python/3.6.8/Python-3.6.8.tar.xz $> tar -xJf Python-3.6.8.tar.xz $> cd Python-3.6.8 $> ./configure --enable-optimizations $> make altinstall
      
      





After successfully installing Python3.6, you need to fulfill the other requirements, for example, create and activate a virtual environment. In addition, the pip module is updated to the latest version and is used to install pg_chameleon. In the commands below, pg_chameleon 2.0.9 is intentionally installed, although the latest version is 2.0.10. This is necessary to avoid new bugs in the updated version.







 $> python3.6 -m venv venv $> source venv/bin/activate (venv) $> pip install pip --upgrade (venv) $> pip install pg_chameleon==2.0.9
      
      





Then we call pg_chameleon (chameleon is a command) with the set_configuration_files argument to enable pg_chameleon and create default directories and configuration files.







 (venv) $> chameleon set_configuration_files creating directory /root/.pg_chameleon creating directory /root/.pg_chameleon/configuration/ creating directory /root/.pg_chameleon/logs/ creating directory /root/.pg_chameleon/pid/ copying configuration example in /root/.pg_chameleon/configuration//config-example.yml
      
      





Now we create a copy of config-example.yml as default.yml so that it becomes the default configuration file. A sample configuration file for this example is shown below.







 $> cat default.yml --- #global settings pid_dir: '~/.pg_chameleon/pid/' log_dir: '~/.pg_chameleon/logs/' log_dest: file log_level: info log_days_keep: 10 rollbar_key: '' rollbar_env: '' # type_override allows the user to override the default type conversion into a different one. type_override: "tinyint(1)": override_to: boolean override_tables: - "*" #postgres destination connection pg_conn: host: "192.168.56.106" port: "5433" user: "usr_replica" password: "pass123" database: "db_replica" charset: "utf8" sources: mysql: db_conn: host: "192.168.56.102" port: "3306" user: "usr_replica" password: "pass123" charset: 'utf8' connect_timeout: 10 schema_mappings: world_x: pgworld_x limit_tables: # - delphis_mediterranea.foo skip_tables: # - delphis_mediterranea.bar grant_select_to: - usr_readonly lock_timeout: "120s" my_server_id: 100 replica_batch_size: 10000 replay_max_rows: 10000 batch_retention: '1 day' copy_max_memory: "300M" copy_mode: 'file' out_dir: /tmp sleep_loop: 1 on_error_replay: continue on_error_read: continue auto_maintenance: "disabled" gtid_enable: No type: mysql skip_events: insert: - delphis_mediterranea.foo #skips inserts on the table delphis_mediterranea.foo delete: - delphis_mediterranea #skips deletes on schema delphis_mediterranea update:
      
      





The configuration file in this example is a sample file with pg_chameleon with minor changes according to the source and target environments, and the following is an overview of the various sections of the configuration file.







In the default.yml configuration file, there is a global settings section where you can control settings such as the location of the lock file, the location of the logs, the storage period of the logs, etc. The next section is the type override section, where the set is specified rules for overriding types during replication. The default example uses a type redefinition rule that converts tinyint (1) to a boolean value. In the next section, we indicate the details of connecting to the target database. In our case, this is the PostgreSQL database designated as pg_conn. In the last section, we indicate the source data, that is, the connection parameters of the source database, the mapping scheme of the source and target databases, the tables to be skipped, timeout, memory, packet size. Note that β€œsources” is indicated in the plural, that is, we can add several source databases for the same target to configure the many-to-one configuration.







The world_x database in the example contains 4 tables with rows that the MySQL community offers for the example. It can be downloaded here . The sample database comes in the form of tar and a compressed archive with instructions for creating and importing strings.







In MySQL and PostgreSQL databases, a special user is created with the same name usr_replica. In MySQL, he is granted additional rights to read all replicated tables.







 mysql> CREATE USER usr_replica ; mysql> SET PASSWORD FOR usr_replica='pass123'; mysql> GRANT ALL ON world_x.* TO 'usr_replica'; mysql> GRANT RELOAD ON *.* to 'usr_replica'; mysql> GRANT REPLICATION CLIENT ON *.* to 'usr_replica'; mysql> GRANT REPLICATION SLAVE ON *.* to 'usr_replica'; mysql> FLUSH PRIVILEGES;
      
      





On the PostgreSQL side, the db_replica database is created, which will accept changes from the MySQL database. PostgreSQL's usr_replica user is automatically configured as the owner of two schemas pgworld_x and sch_chameleon, which contain the actual replicated tables and tables with replication directories, respectively. The create_replica_schema argument is responsible for automatic configuration, as you will see below.







 postgres=# CREATE USER usr_replica WITH PASSWORD 'pass123'; CREATE ROLE postgres=# CREATE DATABASE db_replica WITH OWNER usr_replica; CREATE DATABASE
      
      





The MySQL database is configured with some changes to prepare it for replication, as shown below. You will need to restart the database server for the changes to take effect.







 $> vi /etc/my.cnf binlog_format= ROW binlog_row_image=FULL log-bin = mysql-bin server-id = 1
      
      





Now it is important to check the connection to both database servers so that there are no problems when executing pg_chameleon commands.







On the PostgreSQL node:







 $> mysql -u usr_replica -Ap'admin123' -h 192.168.56.102 -D world_x
      
      





On the MySQL node:







 $> psql -p 5433 -U usr_replica -h 192.168.56.106 db_replica
      
      





The following three pg_chameleon (chameleon) commands prepare the environment, add the source, and initialize the replica. The create_replica_schema argument in pg_chameleon creates the default schema (sch_chameleon) and the replication schema (pgworld_x) in the PostgreSQL database, as we already said. The add_source argument adds the original database to the configuration by reading the configuration file (default.yml), and in our case it is mysql, and init_replica initializes the configuration based on the parameters in the configuration file.







 $> chameleon create_replica_schema --debug $> chameleon add_source --config default --source mysql --debug $> chameleon init_replica --config default --source mysql --debug
      
      





The output from these three commands obviously indicates their successful execution. All crashes or syntax errors are indicated in simple and understandable messages with troubleshooting tips.







Finally, start replication using start_replica and get a success message.







 $> chameleon start_replica --config default --source mysql output: Starting the replica process for source mysql
      
      





Replication status can be requested using the show_status argument, and errors can be viewed using the show_errors argument.







Result.







As we said before, each replication function is controlled by daemons. To view them, we query the process table with the Linux ps command, as shown below.







Result.







Replication is not considered configured until we test it in real time, as shown below. We create a table, insert a couple of records in the MySQL database, and call the sync_tables argument in pg_chameleon to update the daemons and replicate the table with the records in the PostgreSQL database.







 mysql> create table t1 (n1 int primary key, n2 varchar(10)); Query OK, 0 rows affected (0.01 sec) mysql> insert into t1 values (1,'one'); Query OK, 1 row affected (0.00 sec) mysql> insert into t1 values (2,'two'); Query OK, 1 row affected (0.00 sec)
      
      





 $> chameleon sync_tables --tables world_x.t1 --config default --source mysql Sync tables process for source mysql started.
      
      





To confirm the test results, we query the table from the PostgreSQL database and display the rows.







 $> psql -p 5433 -U usr_replica -d db_replica -c "select * from pgworld_x.t1"; n1 | n2 ----+------- 1 | one 2 | two
      
      





If we perform the migration, the following pg_chameleon commands will be its end. The commands need to be executed after we make sure that the rows of all the target tables have been replicated, and the result will be a neatly migrated PostgreSQL database without reference to the source database or replication scheme (sch_chameleon).







 $> chameleon stop_replica --config default --source mysql $> chameleon detach_replica --config default --source mysql --debug
      
      





If desired, the following commands can remove the original configuration and replication scheme.







 $> chameleon drop_source --config default --source mysql --debug $> chameleon drop_replica_schema --config default --source mysql --debug
      
      





Benefits of pg_chameleon



Easy setup and configuration.

Convenient troubleshooting and anomaly detection with clear error messages.

You can add additional special tables to replication after initialization without changing the rest of the configuration.

You can configure several source databases for one target, and this is very convenient if you combine data from one or several MySQL databases into one PostgreSQL database.

You can not replicate selected tables.







Disadvantages of pg_chameleon



Only supported with MySQL 5.5 and higher as a source and PostgreSQL 9.5 and higher as a target database.

Each table must have a primary or unique key, otherwise the tables are initialized in the init_replica process, but not replicated.

One-way replication - only from MySQL to PostgreSQL. Therefore, it is suitable only for the active-passive scheme.

Only the MySQL database can be the source, and support for the PostgreSQL database as a source is only experimental and with limitations (learn more here )







Results for pg_chameleon



The replication method in pg_chameleon is great for migrating a database from MySQL to PostgreSQL. A significant minus is that replication is only one-way, so database specialists are unlikely to want to use it for anything other than migration. But the problem of one-way replication can be solved by yet another open source tool - SymmetricDS.







Read more in the official documentation here . Help on the command line can be found here .







SymmetricDS Overview



SymmetricDS is an open source tool that replicates any database to any other common database: Oracle, MongoDB, PostgreSQL, MySQL, SQL Server, MariaDB, DB2, Sybase, Greenplum, Informix, H2, Firebird and other cloud database instances, for example Redshift, and Azure, etc. Available functions: synchronization of databases and files, replication of several leading databases, filtered synchronization, conversion, and others. This is a Java tool and requires a standard release of JRE or JDK (version 8.0 or higher). Here you can record data changes on triggers in the source database and send them to the corresponding target database in the form of packets.







SymmetricDS Features



The tool is platform independent, that is, two or more different databases can exchange data.

Relational databases are synchronized by writing data changes, and file system-based databases use file synchronization.

Two-way replication using the Push and Pull methods based on a set of rules.

Data transmission is possible over secure networks and networks with low bandwidth.

Automatic recovery when nodes resume after a failure and automatic conflict resolution.

Cloud compatible and powerful extension APIs.







Example



SymmetricDS can be configured in one of two ways:

A master (parent) node that centrally coordinates data replication between two slave (child) nodes, and data exchange between child nodes is carried out only through the parent.

The active node (node ​​1) can exchange data for replication with another active node (node ​​2) without an intermediary.







In both cases, data is exchanged using Push and Pull. In this example, we will look at the active-active configuration. Describe the entire architecture for too long, so check out the guide to learn more about SymmetricDS.







Installing SymmetricDS is very simple: download the open source version of the zip file from here and extract it wherever you want. The table below provides information about the installation location and version of SymmetricDS in this example, as well as database versions, Linux versions, IP addresses and ports for both nodes.







Host vm1 vm2
OS version CentOS Linux 7.6 x86_64 CentOS Linux 7.6 x86_64
DB server version MySQL 5.7.26 PostgreSQL 10.5
DB Port 3306 5832
IP address 192.168.1.107 192.168.1.112
SymmetricDS Version SymmetricDS 3.9 SymmetricDS 3.9
SymmetricDS Installation Path /usr/local/symmetric-server-3.9.20 /usr/local/symmetric-server-3.9.20
SymmetricDS Node Name corp-000 store-001


Here we install SymmetricDS in /usr/local/symmetric-server-3.9.20, and different subdirectories and files will be stored right there. We are interested in the nested directories samples and engines. The samples directory contains sample configuration files with host properties, as well as sample SQL scripts to quickly start the demonstration.







In the samples directory, we see three configuration files with node properties - the name shows the nature of the node in a particular scheme.







 corp-000.properties store-001.properties store-002.properties
      
      





SymmetricDS has all the necessary configuration files for a 3-node base schema (option 1), and the same files can be used for a 2-node schema (option 2). Copy the desired configuration file from the samples directory to engines on the vm1 host. It turns out like this:







 $> cat engines/corp-000.properties engine.name=corp-000 db.driver=com.mysql.jdbc.Driver db.url=jdbc:mysql://192.168.1.107:3306/replica_db?autoReconnect=true&useSSL=false db.user=root db.password=admin123 registration.url= sync.url=http://192.168.1.107:31415/sync/corp-000 group.id=corp external.id=000
      
      





This node in the SymmetricDS configuration is called corp-000, and the database connection is processed by the mysql jdbc driver, which uses the connection string specified above and the login credentials. We connect to the replica_db database, and tables will be created during the creation of the schema. sync.url shows the site of communication with the node for synchronization.







Node 2 on the vm2 host is configured as store-001, and the rest is specified in the node.properties file below. The store-001 node runs the PostgreSQL database, and pgdb_replica is the database for replication. registration.url allows the vm2 host to contact the vm1 host and get configuration details from it.







 $> cat engines/store-001.properties engine.name=store-001 db.driver=org.postgresql.Driver db.url=jdbc:postgresql://192.168.1.112:5832/pgdb_replica db.user=postgres db.password=admin123 registration.url=http://192.168.1.107:31415/sync/corp-000 group.id=store external.id=001
      
      





The SymmetricDS pre-built example contains options for setting up two-way replication between two database servers (two nodes). The steps below are performed on the vm1 host (corp-000), which will create an example schema with 4 tables. Then running create-sym-tables with the symadmin command creates directory tables where rules and replication direction between nodes will be stored. Finally, sample data is loaded into the tables.







 vm1$> cd /usr/local/symmetric-server-3.9.20/bin vm1$> ./dbimport --engine corp-000 --format XML create_sample.xml vm1$> ./symadmin --engine corp-000 create-sym-tables vm1$> ./dbimport --engine corp-000 insert_sample.sql
      
      





In the example, the item and item_selling_price tables are automatically configured for replication from corp-000 to store-001, and the sale tables (sale_transaction and sale_return_line_item) are automatically configured for replication from store-001 to corp-000. Now create a schema in the PostgreSQL database on the vm2 host (store-001) to prepare it for receiving data from corp-000.







 vm2$> cd /usr/local/symmetric-server-3.9.20/bin vm2$> ./dbimport --engine store-001 --format XML create_sample.xml
      
      





Be sure to check that the MySQL database on vm1 has sample tables and SymmetricDS directory tables. Note that the SymmetricDS system tables (with the prefix sym_) are now available only on the corp-000 node, because there we ran the create-sym-tables command and will manage replication. And in the database on the store-001 node there will be only 4 example tables without data.







All. The environment is ready to run sym server processes on both nodes, as shown below.







 vm1$> cd /usr/local/symmetric-server-3.9.20/bin vm1$> sym 2>&1 &
      
      





Log entries are sent to the background log file (symmetric.log) in the log folder in the directory where SymmetricDS is installed, as well as to standard output. The sym server can now be initiated on the store-001 node.







 vm2$> cd /usr/local/symmetric-server-3.9.20/bin vm2$> sym 2>&1 &
      
      





If you run the sym server process on the vm2 host, it will create the SymmetricDS catalog tables in the PostgreSQL database as well. If you run the sym server process on both nodes, they coordinate with each other to replicate data from corp-000 to store-001. If after a few seconds we query all 4 tables on both sides, we will see that replication is successful. Or you can send the bootstrap to store-001 from corp-000 with the following command.







 vm1$> ./symadmin --engine corp-000 reload-node 001
      
      





At this point, a new record is inserted into the item table in the MySQL database on host corp-000 (host: vm1), and you can verify its replication to the PostgreSQL database on host store-001 (host: vm2). We see a Pull operation to move data from corp-000 to store-001.







 mysql> insert into item values ('22000002','Jelly Bean'); Query OK, 1 row affected (0.00 sec)
      
      





 vm2$> psql -p 5832 -U postgres pgdb_replica -c "select * from item" item_id | name ----------+----------- 11000001 | Yummy Gum 22000002 | Jelly Bean (2 rows)
      
      





To perform a Push operation to move data from store-001 to corp-000, insert a record in the sale_transaction table and verify that replication is complete.







Result.







We see the successful setup of two-way replication of sample tables between MySQL and PostgreSQL databases. To configure replication for new user tables, do the following: We create table t1 for an example and configure the rules for its replication as follows. So we only configure replication from corp-000 to store-001.







 mysql> create table t1 (no integer); Query OK, 0 rows affected (0.01 sec)
      
      





 mysql> insert into sym_channel (channel_id,create_time,last_update_time) values ('t1',current_timestamp,current_timestamp); Query OK, 1 row affected (0.01 sec)
      
      





 mysql> insert into sym_trigger (trigger_id, source_table_name,channel_id, last_update_time, create_time) values ('t1', 't1', 't1', current_timestamp, current_timestamp); Query OK, 1 row affected (0.01 sec)
      
      





 mysql> insert into sym_trigger_router (trigger_id, router_id, Initial_load_order, create_time,last_update_time) values ('t1', 'corp-2-store-1', 1, current_timestamp,current_timestamp); Query OK, 1 row affected (0.01 sec)
      
      





The configuration is then notified of a schema change, that is, the addition of a new table, using the symadmin command with the sync-triggers argument, which recreates the triggers to match the table definitions. Send-schema is executed to send the schema changes to the store-001 node, and t1 table replication is configured.







 vm1$> ./symadmin -e corp-000 --node=001 sync-triggers vm1$> ./symadmin send-schema -e corp-000 --node=001 t1
      
      





Benefits of SymmetricDS



Easy installation and configuration, including a ready-made set of files with parameters for creating a circuit with three or two nodes.

Cross-platform databases and platform independence, including servers, laptops and mobile devices.

Replicate any database to any other database locally, in a WAN or in the cloud.

The ability to work optimally with a couple of databases or several thousand for easy replication.

Paid version with a graphical interface and excellent support.







SymmetricDS Disadvantages



It is necessary to manually determine the rules and the direction of replication on the command line through SQL statements to load directory tables, which is inconvenient.

Setting up many tables for replication can be tedious if you do not use scripts to create SQL statements that define the rules and direction of replication.

Too much information is entered into the logs, and sometimes you need to clean up the log file so that it does not take up too much space.







SymmetricDS Summary



SymmetricDS allows you to configure two-way replication between two, three, or even several thousand nodes to replicate and synchronize files. This is a unique tool that independently performs many tasks, for example, automatic data recovery after prolonged downtime on a node, secure and efficient exchange of data between nodes via HTTPS, automatic conflict management based on a set of rules, etc. SymmetricDS replicates between any databases, therefore, it can be used for a wide variety of scenarios, including migration, upgrading, distribution, filtering, and data conversion across platforms.







The sample is based on the official SymmetricDS Quick Start Guide . This user guide details various concepts related to configuring replication using SymmetricDS.








All Articles