A.5. Tungsten Replicator 5.3.0 GA (12 December 2017)

Version End of Life. 7 June 2019

Release 5.3.0 is an important feature release that contains some key new functionality for replication. In particular:

  • JSON data type column extraction support for MySQL 5.7 and higher.

  • Generated column extraction support for MySQL 5.7 and higher.

  • DDL translation support for heterogeneous targets, initially support DDL translation for MySQL to MySQL, Vertica and Redshift targets.

  • Support for data concentration support for replication into a single target schema (with additional source schema information added to each table) for both HPE Vertica and Amazon Redshift targets.

  • Rebranded and updated support for Oracle extraction with the Oracle Redo Reader, including improvements to offboard deployment, more configuration options, and support for the deployment and installation of multiple offboard replication services within a single replicator.

This release also contains a number of important bug fixes and minor improvements to the product.

Improvements, new features and functionality

  • Behavior Changes

    • The way that information is logged has been improved so that it should be easier to identify and find errors and the causes of errors when looking at the logs. To achieve this, logging is now provided into an additional file, one for each component, and the new files contain only errors at the WARNING or ERROR levels. The new file is replicator-user.log. The original file, trepsvc.log remains unchanged.

      All log files have been updated to ensure that where relevant the service name for the corresponding entry is included. This should further help to identify and pinpoint issues by making it clearer what service triggered a particular logging event.

      Issues: CT-30, CT-69

    • Support for Java 7 (JDK or JRE 1.7) has been deprecated, and will be removed in the 6.0.0 release. The software is compiled using Java 8 with Java 7 compatibility.

      Issues: CT-252

    • Some Javascript filters had DOS style line breaks.

      Issues: CT-376

    • Support for JSON datatypes and generated columns within MySQL 5.7 and greater has been added to the MySQL extraction component of the replicator.

      Important

      Due to a MySQL bug, the way that JSON and generated columns is represented within MySQL binary log, it is possible for the size of the data, and the reported size re different and this could cause data corruption To account for this behavior and to prevent data inconsistencies, the replicator can be configured to either ignore, warn, or stop, if the mismatch occurs.

      This can be set by modifying the property replicator.extractor.dbms.json_length_mismatch_policy.

      Until this problem is addressed within MySQL, tpm will still generate a warning about the issue which can be ignored during installation by using the --skip-validation-check=MySQLGeneratedColumnCheck.

      For more information on the effects of the bug, see MySQL Bug #88791.

      Issues: CT-5, CT-468

  • Installation and Deployment

    • The tpm command has been updated to correctly operate with CentOS 7 and higher. Due to an underlying change in the way IP configuration information was sourced, the extraction of the IP address information has been updated to use the ip addr command.

      Issues: CT-35

    • The THL retention setting is now checked in more detail during installation. When the --thl-log-retention is configured when extracting from MySQL, the value is compared to the binary log expiry setting in MySQL (expire_logs_days). If the value is less, then a warning is produced to highlight the potential for loss of data.

      Issues: CT-91

    • A new option, --oracle-redo-temp-tablespace has been added to configure the temporary tablespace within Oracle redo reader extractor deployments.

      Issues: CT-321

  • Command-line Tools

    • The sizes outputs for the thl list command, such as -sizes or -sizesdetail command now additionally output summary information for the selected THL events:

      Total ROW chunks: 8 with 7 updated rows (50%)
      Total STATEMENT chunks: 8 with 2552 bytes (50%)
      16 events processed

      A new option has also been added, -sizessummary, that only outputs the summary information.

      Issues: CT-433

      For more information, see thl list -sizessummary Command.

  • Oracle Replication

    • A new option for tpm has been added, --oracle-tns-port, which is an alias for --replication-port.

      Issues: CT-274

    • The fetcher and miner ports can now be explicitly set. Previously they were fixed as port 7901 and 7902 respectively. Use the --oracle-redo-fetcher-port and --oracle-redo-miner-port.

      Issues: CT-290

  • Heterogeneous Replication

    • The HPE Vertica applier has been updated and expanded so that data can be concentrated from multiple source schemas into a single schema, where all the souce and target schemas share a common table structure. The new functionality relies on the new adddbrowname filter, and a new batch applier script that handles the concentration.

      This functionality also incorporates options to keep a longterm copy of all the CDC data generated by the replicator by copying the data to a secondary set of staging tables. Both this and the core target information are configurable during installation.

      Note

      Full documentation on using this feature is under production and will be available shortly.

      Issues: CT-95

    • Support has now been added for a full DDL replication and translation support, initially from MySQL targets through to Amazon Redshift and HPE Vertica. The functionality allows for schemas and tables to be created, modified, and deleted, without the need to tuse ddlscan, and without having to worry about making changes that stop replication until the structures can be changed.

      The DDL translation supports the following features:

      • Full replication of schema and table operations.

      • Configurable translation of data types, including size differences.

      • Automatically creates staging tables for batch-based appliers.

      • Support for centralized and long term schema replication.

      • Ability to add arbitrary columns to all replicated tables.

      • Ability to choose whether to apply different schema operations on specific schemas or tables. The following options can be controlled:

        • Creating schema

        • Creating table

        • Adding columns to existing table

        • Deleting columns from existing table

        • Modifying columns in existing table

        • Deleting table

        • Deleting schema

        For each operation, the operationg can be applied, ignored, stop replication with an error, or applied with archiving. In the case of the last example, a copy of the table is kept, and changes are applied only to the active table. This enables you to retain existing data and structure so that analytics can continue on a known version of the table. The naming and format of the table can also be set.

        For operations that add or change columns, you choose whether value for the new column within the existing rows for the table are set to the default value, or an explicit value.

      • Data is automatically flushed and committed before table changes are made to ensure that replication does not stop. This process happens automatically, so replicating data, adding a column, and replicating further data does not stop replication, even if the data would normally fail because of table differences and batch applier timings.

      • Existing table schemas can be extracted and replicated automatically through to a target without requiring ddlscan to create the initial tables.

      Note

      Full documentation on using this feature is under production and will be available shortly.

      Issues: CT-131, CT-132

    • The Javascript files used for applying data into batch targets (Redshift, Hadoop, Cassandra, Vertica) have been updated and improved to ensure:

      • Field names are correctly escaped

      • Error messages now contain more information about the problem

      • Where relevant, the host database errors and CSV files are now kept in the event of an error to help identification of the underlying problem.

      These changes should make it easier to identify issues, and to prevent certain issues occurring during replication.

      Issues: CT-96, CT-235

    • The CSV writer module which is used in all batch-related appliers (Redshift, Hadoop, Vertica, Cassandra) has been updated so that it provides more information about the potential problem when a CSV write is identified as invalid.

      Issues: CT-236

    • Support for replicating into Hadoop environments where the underlying filesystem is protected by Kerberos security and authentication has been added to the Hadoop applier. A new file, hadoop_kerberos.js has been added to the distribution which should be edited and used in place of the normal hadoop.js batch file.

      Issues: CT-266

      For more information, see Section 6.4.3, “Replicating into Kerberos Secured HDFS”.

    • The Amazon Redshift applier has been updated and expanded so that data can be concentrated from multiple source schemas into a single schema, where all the souce and target schemas share a common table structure. The new functionality relies on the new adddbrowname filter, and a new batch applier script that handles the concentration.

      Note

      Full documentation on using this feature is under production and will be available shortly.

      Issues: CT-408

  • Filters

    • A new filter, rowadddbname, has been added to the replicator. This filter adds the incoming schema name, and optional numeric hash value of the schema, to every row of THL row-based changes. The filter is designed to be used with heterogeneous and analytics applications where data is being concentrated into a single schema and where the source schema name will be lost during the concentration and replication process.

      In particular, it is designed to work in harmony with the new Redshift and Vertica based single-schema appliers where data from multiple, identical, schemas are written into a single target schema for analysis.

      Issues: CT-98

    • A new filter has been added, rowadddbname, which adds the source database name and optional database hash to every incoming row of data. This can be used to help identify source information when concentrating information into a single schema.

      Issues: CT-407

Bug Fixes

  • Installation and Deployment

    • An issue has been identified with the way certain operating systems now configure their open files limits, which can upset the checks within tpm that determine the open files limits configured for MySQL. To ensure that the open files limit has been set correctly, check the configuration of the service:

      1. Copy the system configuration:

        shell> sudo cp /lib/systemd/system/mysql.service /etc/systemd/system/
        shell> sudo vim /etc/systemd/system/mysql.service
      2. Add the following line to the end of the copied file:

        LimitNOFILE=infinity
      3. Reload the systemctl daemon:

        shell> sudo systemctl daemon-reload
      4. Restart MySQL:

        shell> service mysql restart

      That configures everything properly and MySQL should now take note of the open_files_limit config option.

      Issues: CT-148

    • The check to determine if triggers had been enabled within the MySQL data source would not get executed correctly, meaning that warnings about unsupported triggers would not trigger a notification.

      Issues: CT-185

    • When using tpm diag on a MySQL deployment, the MySQL error log would not be identified and included properly if the default datadir option was not /var/lib/mysql.

      Issues: CT-359

    • Installation when enabling security through SSL could fail intermittently during installation because the certificates would fail to get copied to the required directory during the installation process.

      Issues: CT-402

    • The Net::SSH libraries used by tpm have been updated to reflect the deprecation of paranoid parameter.

      Issues: CT-426

    • Using a complex password, particularly one with single or double quotes, when specifying a password for tpm, could cause checks and the installation to raise errors or fail, although the actual configuration would work properly. The problem was limited to internal checks by tpm only.

      Issues: CT-440

  • Command-line Tools

    • The startall command would fail to correctly start the Oracle redo reader process.

      Issues: CT-283

    • The tpm command would fail to remove the Oracle redo reader user when using tpm uninstall.

      Issues: CT-299

    • The replicator stop command would not stop the Oracle redo reader process.

      Issues: CT-300

    • Within Vertica deployments, the internal identity of the applier was set incorrectly to PostgreSQL. This would make it difficult for certain internal processes to identify the true datasource type. The setting did not affect the actual operation.

      Issues: CT-452

  • Oracle Replication

    • Oracle deployments have been updated so that the replicator is always running in UTF-8 and the NLS_LANG setting is set correctly. This will affect primarily CDC and Oracle applier deployments.

      Issues: CT-251

    • The ddlscan templates for Oracle to MySQL would incorrectly map NUMBER types into DECIMAL with an invalid size definition. This has been updated so that anything larger than a 19 digit NUMBER to a MySQL BIGINT.

      Issues: CT-259

    • The Oracle redo reader component has been rebranded to Continuent, Ltd, and changed internally to be identified as simply 'oracle redo reader'. This has changed the following elements within the product:

      • All components and references to vmrr and vmrrd have been changed to orarr and orarrd respectively.

      • All tpm options that contain vmware have been replaced with oracle, including:

        install-vmware-redo-reader install-oracle-redo-reader
        repl-install-vmware-redo-reader repl-install-oracle-redo-reader
      • All internal references, including the configuration paameters for the redo reader, have been updated to use orarr.

      • The default username and password used with the redo reader have changed from vmrruser to orarruser, and vmrruserpwd to orrruserpwd.

      • The template files used to configure the redo reader have been changed from vmrr_response_file to orarr_response_file, and vmrr_response_file to offboard_orarr_response_file.

      • The vmrrd_wrapper has been renamed to orarrd_wrapper.

      Issues: CT-19, CT-282, CT-367

    • When running the orarrd command to execute the console, the command would fail and report:

      When running orarrd console, you get the following response:
      tungsten@dbora1 alpha$ orarrd_alpha console
      orarr is already started

      Issues: CT-397

    • The orarrrd script contained incorrect environment variables for testing the validity of the installation. This could cause access to the Redo Reader console to fail.

      Issues: CT-401

  • Heterogeneous Replication

    • The Redshift applier would use a relative directory for the AWS configuration reference, but would refer to the wrong location.

      Issues: CT-375

    • The sample configuration file for Redshift mistakenly contained $ characters to indicate variables. These dollar signs are not required.

      Issues: CT-406

  • Core Replicator

    • When parsing THL data it was possible for the internal THL processing to lead to a java.util.ConcurrentModificationException. This indicated that the underlying THL event metadata structure used internally had changed between uses.

      Issues: CT-355