Friday, 22 February 2013

Talend: Using database sequence (postgres sql) to populate primary key or surrogate in target table using talend





Using database sequence (postgres sql) to populate primary key in target table using talend

Problem – we want to use database sequence to generate unique values for primary key column in the target table during insert.

Solution – Talend provides additional columns option in tPostgresqloutput component, which I believe would be similar for all other tdatabaseoutput components.

picture below shows target table dim_page where i have a primary key as page_skey and i want to populate it using dim_page_seq sequence created at database level.
what  we need to do is use additional columns options with sql expression as the function to get next value of sequence from database and position replace.
Also in input to this tPostgresqloutput component i pass page_skey as some dummy value using tmap, which gets overwritten by replace option.

using database sequence




input schema for above output component is as below (this schema is passed as output of tmap)



    

10 comments:

  1. Hi Arpit,

    Thanks for this great article.
    I have the same behavior on my job for an Oracle database and sequenced ids.
    Do you know how to get the replaced value for your "page_skey" inserted?
    I have to retreive this value for other treatments after insertion.

    Thanks (and sorry for my poor english)

    ReplyDelete
  2. i don't think we can retrieve this inserted value for page_skey as think how tool would send insert sql to db it will be something like below
    String insert_tPostgresqlOutput_1 = "INSERT INTO \""
    + tableName_tPostgresqlOutput_1
    + "\" (\""
    + "page_skey"
    + "\",\"page_id\",\"page_url\",\"page_title\",\"dw_updt_dtm\") VALUES ("+ "nextval('dim_page_seq')" + ",?,?,?,?)";

    As you can see it will be insert into target_table values (nextval(sequence), 'abc','other value')....
    so in short it is passing the sql to DB and not getting sequence back from DB first and then inserting it..
    what you can do is use Main row from output component and get logical key values for a row and query it back against your DB table to fetch your KEY or rather read that table again after insert and get the key using input component..

    ReplyDelete
  3. Ok, I see.
    For my problem, the solution was to use a tOracleInput object with the request "SELECT
    SEQ_MYTABLE.NEXTVAL
    FROM DUAL" and have a lookup link (loaded for each row) to my tMap. Not very optimized but it did the trick!

    ReplyDelete
    Replies
    1. How did you load it for each time?

      Delete
  4. Thanks for the article. Very useful

    ReplyDelete
  5. To get the nextVal in your job, you can pass it from a tPostgreSQLInput with a query (select nextval('seq_name') as id) and then feed it to a tMap before passing it to a tPostgreSqlOutput. You can then get the id after the insert. However if the insert in the db has failed, you have already incremented the sequence and thus you should take steps to reset the value in case of errors.

    ReplyDelete
  6. Thanks for the tip, very useful and I didn't find elsewhere how to do it, in english or french.
    Works for me on postgresql db.

    ReplyDelete
  7. Given above is not working for me. can you please help

    ReplyDelete
  8. Hi All,

    Below post will help you to generate Surrogate key using Talend.

    https://www.learningsutras.net/post/tech-generating-surrogate-key-in-talend


    Let me know if you have any doubt.

    Thank you!

    ReplyDelete
  9. Hi All,

    If you want to learn Talend from beginning please download Talend Cookbook from below post :

    https://www.learningsutras.net/post/tech-blog-talend-an-etl-tool

    Thanks,

    ReplyDelete