RapidMiner

RapidMiner

Radoop connection issues in v7.3

Contributor II

Radoop connection issues in v7.3

Hi,

 

Recently, i upgraded from rapidminer v7.2 to 7.3. After the upgrade, the radoop throws java.util.concurrent.TimeoutException while connecting to Hive server 2. In another rapidminer installation (v7.2), the same configuration works fine.

 

Current config details:

Hadoop version: Apache Hadoop 2.2+

Hadoop user name: hadoop

Hive Server2 (Hive 0.13 or newer)

default

 

hive

Spark 1.6

hdfs:///user/spark/spark-assembly.jar

 

Are there any configuration changes to be made in radoop for v7.3? I have tried with rapidminer v7.3 + radoop 7.2 as well as rapidminer v7.3 + radoop 7.3. Both of them does not work. Please help.

 

-Kris

 

3 REPLIES
Highlighted
RMStaff

Re: Radoop connection issues in v7.3

Hi Kris,

 

It would be a bit surprising, if Studio 7.2 and 7.3 behaved differently with the same Radoop version. (So it is valuable, if we find such a case. Smiley Wink ) Can you reproduce this behaviour consistently?

 

I'll copy my answer on how to move on with the problem from another topic.

 

The error states that there were no response from the HiveServer2 instance (specified by either the Master Address or the Hive Server Address fields, and the Hive Port) in a given time.

I would try the following:

  • Check the Hive log on the cluster. Does the SHOW TABLES command that the test sends appear in the log? (It can take seconds on first try.) That confirms that Hive may be accessible, but it may take longer time than the timeout.
  • If the log shows that the command was sent to Hive, then you can increase the timeout in Studio: go to Preferences -> Radoop, and increase the Connection timeout and Hive command timeout values to, let's say, 30. (These timeouts are used for detecting connection problems.)
  • If there is nothing in the log, then I would make sure that the specified address and port can be reached from the machine that runs Studio. If that works, I would check the health of Hive on the cluster from Beeline, for example.

 

Best,

Peter

Contributor II

Re: Radoop connection issues in v7.3

Thanks peter for the response. 

 

Yes. The behaviour is reproducible consistently. Yesterday, i tried creating a Amazon EMR cluster and tried connecting through Radoop. The same issue persists even if I open all inbound ports in the EMR master instance. 

 

All URLs (namenode, history server, spark etc.) are accessible remotely. Only the hiveserver connection fails. Tried increasing the timeouts earlier upto 4minutes, but no luck. Hive works through beeline (tested this locally on the cluster).

 

 

Let me know if there are any other tests I can try out. 

 

 

Contributor II

Re: Radoop connection issues in v7.3

Peter, 

Figured out the issue and resolved it.

 

The problem is the change made in Rapidminer v7.3 in the system -> preferences option. Earlier under system, one has to explicitly specify HTTP proxy and by default, it's no proxy. In the new version, the proxy is a separate option (under system->preferences) and by default it's set to 'System proxy'. Once i changed it to Direct (no proxy), it worked fine. I think the default option should be no proxy. 

 

Sharing this as it might help others who might face similar issues due to upgrade. 

 

-Kris