F.4.1. Understanding Tungsten Replicator Memory Tuning
Replicators are implemented as Java processes, which use two types of
memory: stack space, which is allocated per running thread and holds
objects that are allocated within individual execution stack frames, and
heap memory, which is where objects that persist across individual method
calls live. Stack space is rarely a problem for Tungsten as replicators
rarely run more than 200 threads and use limited recursion. The Java
defaults are almost always sufficient. Heap memory on the other hand runs
out if the replicator has too many transactions in memory at once. This
results in the dreaded Java OutOfMemory exception, which causes the
replicator to stop operating. When this happens you need to look at tuning
the replicator memory size.
To understand replicator memory usage, we need to look into how
replicators work internally. Replicators use a "pipeline" model of
execution that streams transactions through 1 or more concurrently
executing stages. As you can can see from the attached diagram, a slave
pipeline might have a stage to read transactions to the master and put
them in the THL, a stage to read them back out of the THL into an
in-memory queue, and a stage to apply those transactions to the slave.
This model ensures high performance as the stages work independently. This
streaming model is quite efficient and normally permits Tungsten to
transfer even exceedingly large transactions, as the replicator breaks
them up into smaller pieces called transaction fragments.
The pipeline model has consequences for memory management. First of all,
replicators are doing many things at one, hence need enough memory to hold
all current objects. Second, the replicator works fastest if the in-memory
queues between stages are large enough that they do not ever become empty.
This keeps delays in upstream processing from delaying things at the end
of the pipeline. Also, it allows replicators to make use of block commit.
Block commit is an important performance optimization in which stages try
to commit many transactions at once on slaves to amortize the cost of
commit. In block commit the end stage continues to commit transactions
until it either runs out of work (i.e., the upstream queue becomes empty)
or it hits the block commit limit. Larger upstream queues help keep the
end stage from running out of work, hence increase efficiency.
Bearing this in mind, we can alter replicator behavior in a number of ways
to make it use less memory or to handle larger amounts of traffic without
getting a Java OutOfMemory error. You should look at each of these when
Property wrapper.java.memory in file
wrapper.conf. This controls the amount of heap
memory available to replicators. 1024 MB is the minimum setting for
most replicators. Busy replicators, those that have multiple services,
or replicators that use parallel apply should consider using 2048 MB
instead. If you get a Java OutOfMemory exception, you should first try
raising the current setting to a higher value. This is usually enough
to get past most memory-related problems. You can set this at
installation time as the
If you set the heap memory to a very large value (e.g. over 3 GB), you
should also consider enabling concurrent garbage collection. Java by
default uses mark-and-sweep garbage collection, which may result in
long pauses during which network calls to the replicator may fail.
Concurrent garbage collection uses more CPU cycles and reduces
on-going performance a bit but avoids periods of time during which the
replicator is non-responsive. You can set this using the
parameter at installation time.)
This controls two things, the size of in-memory queues in the
replicator as well as the block commit size. If you still have
problems after increasing the heap size, try reducing this value. It
reduces the number of objects simultaneously stored on the Java heap.
A value of 2 is a good setting to try to get around temporary
problems. This can be set at installation time as the
the replicator properties file. This parameter sets the block commit
count in the final stage in a slave pipeline. If you reduce the global
buffer size, it is a good idea to set this to a fixed size, such as
10, to avoid reducing the block commit effect too much. Very low block
commit values in this stage can cut update rates on slaves by 50% or
more in some cases. This is available at installation time as the
replicator.properties file. This parameter
controls the size of fragments for long transactions. Tungsten
automatically breaks up long transactions into fragments. This
parameter controls the number of bytes of binlog per transaction
fragment. You can try making this value smaller to reduce overall
memory usage if many transactions are simultaneously present. Normally
however this value has minimal impact.
Finally, it is worth mentioning that the main cause of out-of-memory
conditions in replicators is large transactions. In particular, Tungsten
cannot fragment individual statements or row changes, so changes to very
large column values can also result in OutOfMemory conditions. For now the
best approach is to raise memory, as described above, and change your
application to avoid such transactions.