Wednesday 3 October 2012

Talend - tjdbcoutput (other database output components) performance improvement - use of Batchsize / enable parallel execution parameter under Advance settings


Issue - While working with tjdbcoutput parameter - one of our source table was big and when we were inserting data into the target we found slow performance with tjdbcoutput component. Later we took help of batchsize and enable parallel execution features under advance settings to get better performance



Solution – Use of batchsize/commit frequence/enable parallel execution in Advance settings helps in getting good performance - usage of batchsize parameter will be more clear if you look at code generated by talend. What talend does is as below in the code

Read data from database in a rs_input (record set) by executing input component SQL.
Next it loops through each record till it reaches the batchsize counter and in each iteration of loop it prepares the INSERT /delete/update statement as asked by output component in our case tjdbcoutput. Next when this batchsize counter is reached talend executes this batch of sql statements against DB.

So if you increase the batchsize – you will get more throughput or rather more rows will flow to your tjdbdoutput component in one batch and you will get better performance at DB level as to and fro between talend and DB will reduce as you will have more rows going in one batch and less number of batch trips between talend and DB.
Another point where you can receive good performance gains is by adjusting your commit frequency (number of rows to commit) – try to play with different options – but we found that the closest or equal it is to batchsize it helps.

Enable parallel execution – is very useful feature if you want to do tjdbcoutput operation in parallel threads. It again depends on your database settings – but its worth trying to set it 2 or 4 and see whether you gain benefit as what it does is that – if it is set to 4 it will fire 4 batches of sql statements (insert/delete/update/ as the case be) and if you have not used EXISTING connections then you can also see 4 new connections in this case.
There is no hard and fast setting for these - you have arrive at what works best for your case.

2 comments: