Monday 30 December 2013

Talend integration with Hive on hadoop – Part#2 (Read data from Hive)

Talend integration with Hive on hadoop – Part#2 (Read data from Hive)

In my previous example Talend integration with Hive on hadoop – Part#1
we created external table customers_ext in my hive database and loaded data into this table.
In this example we will read data from this table and once we have this data into talend server/memory we can transform/move this data as per our need using other talend components.

Pre-requisites –
1)    Talend integration with Hive on hadoop – Part#1

See job below - it uses tHiveInput component to run sql
"select country,count(1) from arpitdb.customers_ext group by country" against my hive db.

and output from tHiveInput is printed using tLogRow.

See below screenshots for more details

output on execution of job  is given below- first hive will run sql and will internally run map/reduce job and finally give results output to talend.

Starting job job_for_blog at 18:06 30/12/2013.

[statistics] connecting to socket on port 3803
[statistics] connected
|       tLogRow_1        |
|country     |countofrows|
|Australia   |1034       |
|Canada      |1004       |
|Chile       |1047       |
|China       |1002       |
|France      |971        |
|Germany     |1004       |
|Japan       |989        |
|Russia      |1012       |
|South Africa|935        |
|UK          |1002       |
|US          |10000      |
|country     |1          |

[statistics] disconnected

Job job_for_blog ended at 18:07 30/12/2013. [exit code=0]

1 comment:

  1. Hi,
    I am done same as above you said. But i got an error was

    Exception in component tHiveRow_1
    java.sql.SQLException: org.apache.thrift.TApplicationException: Invalid method name: 'execute'
    at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(
    at org.apache.hadoop.hive.jdbc.HiveStatement.execute(
    at org.apache.hadoop.hive.jdbc.HiveConnection.configureConnection(
    at org.apache.hadoop.hive.jdbc.HiveConnection.(
    at org.apache.hadoop.hive.jdbc.HiveDriver.connect(
    at java.sql.DriverManager.getConnection(
    at java.sql.DriverManager.getConnection(
    at mirth.sample_hive_0_1.Sample_Hive.tHiveRow_1Process(
    at mirth.sample_hive_0_1.Sample_Hive.runJobInTOS(
    at mirth.sample_hive_0_1.Sample_Hive.main(

    Please help me to resolve this issue.

    Thanks in Advance

