prepare() is called when the replicator goes online
begin() is called before a single transaction starts
apply() is called to copy and load the raw CSV data
commit() is called after the raw data has been loaded
release() is called when the replicator goes offline
CSV files are divided across the scripts. If there is a large number of files that all take about the same time to load and there are three threads (parallelization=3), each individual load script will see about a third of the files. You should therefore not code assumptions that you have seen all tables or CSV files in a single script.
Parallel load script is only recommended for data sources like Hadoop that are idempotent. When applying to a data source that is non-idempotent (for example MySQL or potentially Vertica) you should just use a single thread.