4.2. Distributed Datasource Groups

Note

This feature was introduced in v7.1.0

4.2.1. Introduction to DDG

Tungsten Distributed Datasource Groups (DDG) are, at their core, a single Standalone cluster, with an odd number of nodes, as usual.

In addition, every node in the cluster uses the same [serviceName], also as usual. The key differences here are that:

  • Each node in the cluster is assigned a Distributed Datasource Group ID (DDG-ID)

  • Nodes with the same DDG-ID will act as if they are part of a separate cluster, limiting failovers to nodes inside the group until there are no more failover candidates, at which time a node in a different groupvirtual ID will be selected as the new primary during a failover.

This means that you would assign nodes in the same region or datacenter the same DDG-ID.

There is still only a single write Primary amongst all the nodes in all the regions, just like Composite Active/Active (CAP).

Unlike CAP, if all nodes in the datacenter containing the Primary node were gone, a node in a different location would be promoted to Primary.

The networks between the datacenters or regions must be of low latency similar to LAN speed for this feature to work properly.

Also, the node in the same group with the most THL downloaded will be selected as the new Primary. If no node is available in the same group, the node with the most THL available is selected from a different group.

4.2.2. How DDG Works

To illustrate the new topology, imagine a 5-node standard cluster spanning 3 datacenters with 2 nodes in DC-A, 2 nodes in DC-B and 1 node in DC-C.

Nodes in DC-A have DDG-ID of 100, nodes in DC-B have DDG-ID of 200, and nodes in DC-C have DDG-ID of 300.

Below are the failure scenarios and resulting actions:

  • Primary fails

    • Failover to any healthy Replica in the same Region/Datacenter (virtual ID group)

  • Entire Region/Datacenter containing the Primary node fails

    • Failover to any healthy Replica in a different Region/Datacenter (virtual ID group)

  • Network partition between any two Regions/Datacenters

    • No action, quorum is maintained by the majority of Managers.

    • Application servers not in the Primary Datacenter will fail to connect

  • Network partition between all Regions/Datacenters

    • All nodes FAILSAFE/SHUNNED

  • Any two Regions/Datacenters offline

    • All nodes FAILSAFE/SHUNNED

Note

Manual intervention to recover the cluster will be required any time the cluster is placed into the FAILSAFE/SHUNNED state.

When configured as per the above example, the ls output from within cctrl will look like the following:

DATASOURCES:
+---------------------------------------------------------------------------------+
|db1-demo.continuent.com(master:ONLINE, progress=0, THL latency=0.495)            |
|STATUS [OK] [2023/06/23 05:46:52 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=100)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=master, state=ONLINE)                                          |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db2-demo.continuent.com(slave:ONLINE, progress=0, latency=0.978)                 |
|STATUS [OK] [2023/06/23 05:46:51 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=100)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db1-demo.continuent.com, state=ONLINE)           |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=4, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db3-demo.continuent.com(slave:ONLINE, progress=0, latency=0.705)                 |
|STATUS [OK] [2023/06/23 05:46:51 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=200)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db1-demo.continuent.com, state=ONLINE)           |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=4, active=0)                                               |
+---------------------------------------------------------------------------------+
+---------------------------------------------------------------------------------+
|db4-demo.continuent.com(slave:ONLINE, progress=0, latency=2.145)                 |
|STATUS [OK] [2023/06/23 05:46:54 PM UTC][SSL]                                    |
|DATASOURCE GROUP(id=200)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
|  REPLICATOR(role=slave, master=db1-demo.continuent.com, state=ONLINE)           |
|  DATASERVER(state=ONLINE)                                                       |
|  CONNECTIONS(created=0, active=0)                                               |
+---------------------------------------------------------------------------------+

WITNESSES:
+---------------------------------------------------------------------------------+
|db5-demo.continuent.com(witness:ONLINE)                                          |
|DATASOURCE GROUP(id=300)                                                         |
+---------------------------------------------------------------------------------+
|  MANAGER(state=ONLINE)                                                          |
+---------------------------------------------------------------------------------+

4.2.3. Configuring DDG

Configuration is very easy, just pick an integer ID for each set of nodes you wish to group together, usually based upon location like region or datacenter.

Folow the steps for deploying and configuring a standard cluster detailed at Section 3.1, “Deploying Standalone HA Clusters” with one simple addition to the configuration, just add an additional new line to the [defaults] section of the /etc/tungsten/tungsten.ini file on every node, including Connector-only nodes, for example:

[defaults]
datasource-group-id=100

The new tpm configuration option datasource-group-id defines which Distributed Datasource Group that node belongs to. The new entry must be in the [defaults] section of the configuration.

Ommitting datasource-group-id from your configuration or setting the value to 0 disables this feature. A positive integer, >0 will enable DDG.