Friday 4 April 2014

Talend - Insert data from S3 to aws redshift DB using copy from command via Talend

Talend - Insert data from S3 to aws redshift DB using copy from command via Talend


Lets say there is a need to move data from one of your source DB to aws redshift db via talend. This could be done via using t(youdbspecific)input component and tRedshiftOutput component. But this might be slow when compared to using COPY command in aws redshift for copy from S3.
We can implement COPY from S3 file in talend as below.

Below screenshot of job contains tpostgressqlInput component - which is my source database from where i want to read data.
I write this data into a flat file on my local machine/server using tFileinputDelimited component.
once written - i need to transfer this flat file to amazon aws S3. We can use tS3Put component for this. This component would need bucket/acess key/secret key details for your AWS account.
Once file has been uploaded to S3, we can use tRedshiftRow to execute copy from command and load data to aws redshift table.