7.12.10. Adjusting the Connnector Response to Resource Losses

This section describes how to control the Connector responses in the event of the loss of a required Datasource or all Managers.

7.12.10.1. Adjusting the Connnector Response to Datasource Loss

Summary: Whenever no Primary datasource is found, the Connector will reject connection requests.

This feature controls how long the Connector waits for the given type of DataSource to come ONLINE before forcibly disconnecting the client application.

By default, wait indefinitely for a resource to become available.

Warning

Prior to software versions 5.3.2/6.0.1, the ONHOLD state would reject new connection attempts instead of pausing them. Also, waitIfUnavailableTimeout was ignored, and connection attempts were never severed after timeout.

There are two (2) parameters involved in this decision-making. They are:

  • waitIfUnavailable (default: true)

    If waitIfUnavailable is true, then the Connector will wait for up to the time period specified by waitIfUnavailableTimeout to make a connection for a given QOS. If the timeout expires, the Connector will disconnect the client application (reject connection attempts and close ongoing connections).

    If waitIfUnavailable is false, the Connector will immediately disconnect the client with an error if a connection for a given QOS cannot be made immediately.

  • waitIfUnavailableTimeout (default: 0, wait indefinitely)

    If waitIfUnavailable is true, the Connector will wait for up to waitIfUnavailableTimeout number of seconds before disconnecting the client. If waitIfUnavailable is false, this parameter is ignored. If this parameter is set to zero (0) seconds, the Connector will wait indefinitely (client connection requests will hang forever).

waitIfUnavailable is specific to data source availability and will be considered if everything else is online: connector and data service.

This will typically be used during a switch or failover while the primary changes: the client application will request a primary (RW_STRICT QoS), which at some point is not available since both new and old primaries are offline. With waitIfUnavailable=true, the connector will wait for the new one to come online (Upto waitIfUnavailableTimeout seconds), allowing seamless failover. If set to false, there will be a period of time during switch/failover where client applications will get errors trying to connect AND reconnect/retry failing requests.

For example, to immediately reject connections upon Datasource loss:

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --property=waitIfUnavailable=false

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

shell> vi /etc/tungsten/tungsten.ini
[alpha]
...
property=waitIfUnavailable=false

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

Warning

PLEASE NOTE: this will make switch and failover much less transparent to the application since the connections will error until the new Primary is elected and back online.

Important

Updating these values require a connector restart (via tpm update) for the changes to be recognized.

These entries will NOT work if placed into [defaults], each service must be handled individually.

7.12.10.2. Adjusting the Connnector Response to Manager Loss

Summary: Whenever the Connector loses sight of the managers for a given data service, it will either suspend or reject new connection requests.

waitIfDisabled applies to both:

  1. Whole offline data service: client application tries to connect to a composite or physical data service that is offline, e.g. during a full site outage where the client application requests access to a local primary without allowing redirection to remote site.

  2. connector offline or onhold: typically when the connector looses connectivity to all managers in the cluster, it will first go onhold, then offline. In both cases, waitIfDisabled defines what to do in such case: throw an error to the client application or wait until network is back and access to a manager is possible. For example, when the connector is isolated from the cluster, setting waitIfDisabled=true will make new connection requests "hang" until either the connector gets back network access to a manager OR waitIfDisabledTimeout is reached

By default, suspend requests indefinitely until Manager communications are re-established.

This feature controls how long the Connector waits during a manager loss event to either suspend or reject the client connection.

Here is the decision chain and associated settings for what happens when the connector loses sight of the managers:

  1. Delay for the value of delayBeforeOnHoldIfNoManager seconds which is 0/no delay by default.

  2. Change state to ON-HOLD and begin the countdown timer starting from the delayBeforeOfflineIfNoManager value.

    In the ON-HOLD state, the connector will hang all new connections and allow existing connections to continue.

  3. When the delayBeforeOfflineIfNoManager timer expires (30 seconds by default), change state to OFFLINE.

    Once OFFLINE, the Connector with break existing connections because there is no authoritative Manager node from the Connector's perspective. Without a Manager link, any change to the cluster configuration will remain invisible to the Connector, potentially leading to writes on a Replica node.

    By default, all new connection requests will hang in the OFFLINE state. If waitIfDisabled is set to false, then the Connector will instead reject all new connections.

There are multiple parameters involved in this decision-making. They are:

  • delayBeforeOnHoldIfNoManager (in seconds, default: 0, i.e. no delay)

    When the connector loses sight of the managers, delay before going ON-HOLD for the value of delayBeforeOnHoldIfNoManager seconds, which is 0/no delay by default.

  • delayBeforeOfflineIfNoManager (in seconds, default: 30)

    Once ON-HOLD, delay before going OFFLINE for the value of delayBeforeOfflineIfNoManager seconds, 30 by default.

  • waitIfDisabled (default: true)

    If the Dataservice is OFFLINE because it is unable to communicate with any Manager, the waitIfDisabled parameter determines whether to suspend connection requests or to reject them. If waitIfDisabled is true (the default), then the Connector will wait indefinitely for manager communications to be re-established. If waitIfDisabled is set to false, the Connector will return an error immediately.

    To check for data service state, use the tungsten-connector/bin/connector cluster-status command. For example:

    shell> connector cluster-status
    Executing Tungsten Connector Service --cluster-status ...
    +--------------+--------------------+-------------+--------------+--------+--------+--------------------------------------+------------------+-----------------+------------------+--------------------+---------------------+
    | Data service | Data service state | Data source | Is composite | Role   | State  | High water                           | Last shun reason | Applied latency | Relative latency | Active connections | Connections created |
    +--------------+--------------------+-------------+--------------+--------+--------+--------------------------------------+------------------+-----------------+------------------+--------------------+---------------------+
    | europe       | OFFLINE            | c1          | false        | master | ONLINE | 0(c1-bin.000002:0000000000000510;-1) | MANUALLY-SHUNNED | 0.0             | 5193.0           | 0                  | 3                   |
    | europe       | OFFLINE            | c2          | false        | slave  | ONLINE | 0(c1-bin.000002:0000000000000510;-1) |                  | 1.0             | 5190.0           | 0                  | 0                   |
    | europe       | OFFLINE            | c3          | false        | slave  | ONLINE | 0(c1-bin.000002:0000000000000510;-1) |                  | 2.0             | 5190.0           | 0                  | 1                   |
    +--------------+--------------------+-------------+--------------+--------+--------+--------------------------------------+------------------+-----------------+------------------+--------------------+---------------------+

For more information, see Connector On-Hold State.

For example, to decrease the ON-HOLD time to 15 seconds:

Click the link below to switch examples between Staging and INI methods...

Show Staging

Show INI

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging USER is `tpm query staging| cut -d: -f1 | cut -d@ -f1`
The staging USER is tungsten

shell> echo The staging HOST is `tpm query staging| cut -d: -f1 | cut -d@ -f2`
The staging HOST is db1

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> ssh {STAGING_USER}@{STAGING_HOST}
shell> cd {STAGING_DIRECTORY}
shell> ./tools/tpm configure alpha \
    --property=delayBeforeOfflineIfNoManager=15

Run the tpm command to update the software with the Staging-based configuration:

shell> ./tools/tpm update

For information about making updates when using a Staging-method deployment, please see Section 10.3.7, “Configuration Changes from a Staging Directory”.

shell> vi /etc/tungsten/tungsten.ini
[alpha]
...
property=delayBeforeOfflineIfNoManager=15

Run the tpm command to update the software with the INI-based configuration:

shell> tpm query staging
tungsten@db1:/opt/continuent/software/tungsten-clustering-7.1.4-10

shell> echo The staging DIRECTORY is `tpm query staging| cut -d: -f2`
The staging DIRECTORY is /opt/continuent/software/tungsten-clustering-7.1.4-10

shell> cd {STAGING_DIRECTORY}

shell> ./tools/tpm update

For information about making updates when using an INI file, please see Section 10.4.4, “Configuration Changes with an INI file”.

For example, to immediately reject connections upon Manager loss:

Show Staging

Show INI

shell> ./tools/tpm configure alpha \
    --property=waitIfDisabled=false
[alpha]
...
property=waitIfDisabled=false

Warning

PLEASE NOTE: this will make switch and failover much less transparent to the application since the connections will error until communications with at least one manager has been established and the Connector is back online.

Important

Updating these values require a connector restart (via tpm update) for the changes to be recognized.

These entries will NOT work if placed into [defaults], each service must be updated individually.