Character sets are a headache in batch loading because all updates are
written and read from CSV files, which can result in invalid transactions
along the replication path. Such problems are very difficult to debug.
Here are some tips to improve chances of happy replicating.
Use UTF8 character sets consistently for all string and text data.
Force Tungsten to convert data to Unicode rather than transferring
tpm ... --mysql-use-bytes-for-string=false
When starting the replicator for MySQL replication, include the
following option tpm file:
tpm ... --java-file-encoding=UTF8