Talend - Data Integration using talend: Talend integration with Hive on hadoop

Monday, 30 December 2013

Talend integration with Hive on hadoop – Part#2 (Read data from Hive)

Talend integration with Hive on hadoop – Part#2 (Read data from Hive)

In my previous example Talend integration with Hive on hadoop – Part#1
we created external table customers_ext in my hive database and loaded data into this table.
In this example we will read data from this table and once we have this data into talend server/memory we can transform/move this data as per our need using other talend components.

Pre-requisites –
1) Talend integration with Hive on hadoop – Part#1

See job below - it uses tHiveInput component to run sql
"select country,count(1) from arpitdb.customers_ext group by country" against my hive db.

and output from tHiveInput is printed using tLogRow.

See below screenshots for more details

output on execution of job is given below- first hive will run sql and will internally run map/reduce job and finally give results output to talend.

Starting job job_for_blog at 18:06 30/12/2013.

[statistics] connecting to socket on port 3803
[statistics] connected
.------------+-----------.
|       tLogRow_1        |
|=-----------+----------=|
|country     |countofrows|
|=-----------+----------=|
|Australia   |1034       |
|Canada      |1004       |
|Chile       |1047       |
|China       |1002       |
|France      |971        |
|Germany     |1004       |
|Japan       |989        |
|Russia      |1012       |
|South Africa|935        |
|UK          |1002       |
|US          |10000      |
|country     |1          |
'------------+-----------'

[statistics] disconnected

Job job_for_blog ended at 18:07 30/12/2013. [exit code=0]

1 comment:

Unknown25 April 2014 at 05:29
Hi,
I am done same as above you said. But i got an error was

Exception in component tHiveRow_1
java.sql.SQLException: org.apache.thrift.TApplicationException: Invalid method name: 'execute'
at org.apache.hadoop.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:191)
at org.apache.hadoop.hive.jdbc.HiveStatement.execute(HiveStatement.java:127)
at org.apache.hadoop.hive.jdbc.HiveConnection.configureConnection(HiveConnection.java:126)
at org.apache.hadoop.hive.jdbc.HiveConnection.(HiveConnection.java:121)
at org.apache.hadoop.hive.jdbc.HiveDriver.connect(HiveDriver.java:104)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at mirth.sample_hive_0_1.Sample_Hive.tHiveRow_1Process(Sample_Hive.java:468)
at mirth.sample_hive_0_1.Sample_Hive.runJobInTOS(Sample_Hive.java:797)
at mirth.sample_hive_0_1.Sample_Hive.main(Sample_Hive.java:663)

Please help me to resolve this issue.

Thanks in Advance

Thirupathi
ReplyDelete
Replies

Add comment