3.16. Continuent Tungsten 1.5.4 GA (Not yet released)

Release Notes 1.5.4 is a maintenance release that adds important bug fixes to the Tungsten 1.5.3 release currently in use by most Tungsten customers. It contains the following key improvements:

  • Introduces quorum into Tungsten clusters to help avoid split brain problems due to network partitions. Cluster members vote whenever a node becomes unresponsive and only continue operating if they are in the majority. This feature greatly reduces the chances of multiple live masters.

  • Enables automatic restart of managers after network hangs that disrupt communications between managers. This feature enables clusters to ride out transient problems with physical hosts such as storage becoming inaccessible or high CPU usage that would otherwise cause cluster members to lose contact with each other, thereby causing application outages or manager non-responsiveness.

  • Adds "witness-only managers" which replace the previous witness hosts. Witness-only managers participate in quorum computation but do not manage a DBMS. This feature allows 2 node clusters to operate reliably across Amazon availability zones and any environment where managers run on separate networks.

  • Numerous minor improvements to cluster configuration files to eliminate and/or document product settings for simpler and more reliable operation.

Continuent recommends that customers who are awaiting specific fixes for 1.5.3 release consider upgrade to Release Notes 1.5.4 as soon as it is generally available. All other customers should consider upgrade to Release Notes 2.0.1 as soon as it is convenient. In addition, we recommend all new projects start out with version 2.0.1.

Behavior Changes

The following changes have been made to Continuent Tungsten and may affect existing scripts and integration tools. Any scripts or environment which make use of these tools should check and update for the new configuration:

  • Failover could be rolled back because of a failure to release a Virtual IP. The failure has been updated to trigger a warning, not a rollback of failover.

    Issues: TUC-1666

  • An 'UnknownHostException' would cause a failover. The behavior has been updated to result in a suspect DB server.

    Issues: TUC-1667

  • Failover does not occur if the manager is not running, on the master host, before the time that the database server is stopped.

    Issues: TUC-1900

Improvements, new features and functionality

  • Installation and Deployment

    • tpm should validate connector defaults that would fail an upgrade.

      Issues: TUC-1850

    • Improve tpm error message when running from wrong directory.

      Issues: TUC-1853

  • Tungsten Connector

    • Add support for MySQL cursors in the connector.

      Issues: TUC-1411

    • Connector must forbid zero keepAliveTimeout.

      Issues: TUC-1714

    • In SOR deployments only, Connector logs show relay data service being added twice.

      Issues: TUC-1720

    • Change default delayBeforeOfflineIfNoManager router property to 30s and constrain it to max 60s in the code.

      Issues: TUC-1752

    • Router Manager connection timeout should be a property.

      Issues: TUC-1754

    • Reject server version that don't start with a number.

      Issues: TUC-1776

    • Add client IP and port when logging connector message.

      Issues: TUC-1810

    • Make tungsten cluster status more sql-like and reduce the amount of information displayed.

      Issues: TUC-1814

    • Allow connections without a schema name.

      Issues: TUC-1829

  • Other Issues

    • Remove old/extra/redundant configuration files.

      Issues: TUC-1721

Bug Fixes

  • Installation and Deployment

    • Within tpm the witness host was previously required and was not validated

      Issues: TUC-1733

    • Ruby tests should abort if installation fails

      Issues: TUC-1736

    • Test witness hosts on startup of the manager and have the manager exit if there are any invalid witness hosts.

      Issues: TUC-1773

    • Installation fails with Ruby 1.9.

      Issues: TUC-1800

    • When using tpm to start from a specific event, the correct directory would not be used for the selected method.

      Issues: TUC-1824

    • When specifying a witness host check with tpm, the check works for IP addresses but fails when using host names.

      Issues: TUC-1833

    • Cluster members do not reliably form a group following installation.

      Issues: TUC-1852

    • Installation fails with Ruby 1.9.1.

      Issues: TUC-1868

  • Command-line Tools

    • Nagios check scripts not picking up shunned datasources

      Issues: TUC-1689

  • Cookbook Utility

    • Cookbook should not specify witness hosts in default configuration files etc.

      Issues: TUC-1734

  • Backup and Restore

    • Restore with xtrabackup empties the data directory and then fails.

      Issues: TUC-1849

    • A recovered datasource does not always come online when in automatic policy mode

      Issues: TUC-1851

    • Restore on datasource in slave dataservice fails to reload.

      Issues: TUC-1856

    • After a restore, datasource is welcomed and put online, but never gets to the online state.

      Issues: TUC-1861

    • A restore that occurs immediately after a recover from dataserver failure always fails.

      Issues: TUC-1870

  • Core Replicator

    • LOAD (LOCAL) DATA INFILE would fail if the request starts with white spaces.

      Issues: TUC-1639

    • Null values are not correctly handled in keys for row events

      Issues: TUC-1823

  • Tungsten Connector

    • Connector fails to send back full result of stored procedure called by prepared statement (pass through mode on).

      Issues: TUC-36

    • Router gateway can prevent manager startup if the connector is started before the manager

      Issues: TUC-850

    • The Tungsten show processlist command would throw NPE errors.

      Issues: TUC-1136

    • The default SQL Router properties uses the wrong load balancer

      Issues: TUC-1437

    • Router must go into fail-safe mode if it loses connectivity to a manager during a critical command.

      Issues: TUC-1549

    • When in a SOR deployment, the Connector will never return connection requests with RO_RELAXED and affinity set to relay node only site.

      Issues: TUC-1620

    • Affinity not honored when using direct connections.

      Issues: TUC-1628

    • An attempt to load a driver listener class can cause the connector to hang, at startup.

      Issues: TUC-1669

    • Broken connections returned to the c3p0 pool - further use of these will show errors.

      Issues: TUC-1683

    • The connector tungsten flush privileges command causes a temporary outage (denies new connection requests).

      Issues: TUC-1730

    • Connector should require a valid manager to operate even when in maintenance mode.

      Issues: TUC-1781

    • Session variables support for row replication

      Issues: TUC-1784

    • Connector allows connections to an offline/on-hold composite dataservice.

      Issues: TUC-1787

    • Router notifications are being sent to routers via GCS. This is unnecessary since a manager only updates routers that are connected to it.

      Issues: TUC-1790

    • Pass through not handling correctly multiple results in 1.5.4.

      Issues: TUC-1792

    • SmartScale will fail to create a database and use immediately.

      Issues: TUC-1836

  • Tungsten Manager

    • A manager that cannot see itself as a part of a group should fail safe and restart

      Issues: TUC-1722

    • Retry of tests for networking failure does not work in the manager/rules

      Issues: TUC-1723

    • The 'vip check' command produces a scary message in the manager log if a VIP is not defined

      Issues: TUC-1772

    • Restored Slave did not change to correct master

      Issues: TUC-1794

    • If a manager leaves a group due to a brief outage, and does not restart, it remains stranded from the rest of the group but 'thinks' it's still a part of the group. This contributed to the main cause of hanging/restarts during operations.

      Issues: TUC-1830

    • Failover of relay aborts when relay host reboots, leaving data sources of slave service in shunned or offline state.

      Issues: TUC-1832

    • The recover command completes but cannot welcome the datasource, leading to a failure in tests.

      Issues: TUC-1837

    • After failover on primary master, relay datasource points to wrong master and has invalid role.

      Issues: TUC-1858

    • A stopped dataserver would not be detected if cluster was in maintenance mode when it was stopped.

      Issues: TUC-1860

    • Manager attempts to get status of remote replicator from the local service - causes a failure to catch up from a relay.

      Issues: TUC-1864

    • Using the recover using command can result in more than one service in a composite service having a master and if this happens, the composite service will have two masters.

      Issues: TUC-1874

    • Using the recover using command, the operation recovers a datasource to a master when it should recover it to a relay.

      Issues: TUC-1882

    • ClusterManagementHandler can read/write datasources directly from the local disk - can cause cluster configuration information corruption.

      Issues: TUC-1899

  • Platform Specific Deployments

    • FreeBSD: Replicator hangs when going offline. Can cause switch to hang/abort.

      Issues: TUC-1668