RapidMiner

Radoop problem in executing process

Contributor II

Radoop problem in executing process

Dear All,

I have managed to connect Hive and Spark and Hadoop and setup Radoop connection. I am now working with Radoop Nest in an example of "Titanic" data. I have put the titanic data in Hive and want to use Radoop Validation process on the data. The running process failes with this error:

 

HiveQL problem Message: Error running query: java.lang.NoClassDefFoundError: scala/collection.Iterable

 

Where do you think is my problem?

 

Regards,

Maziar

5 REPLIES
Contributor II

Re: Radoop problem in executing process

Dear Maziar,

 

the issue is probably related to the Hive classpath on your Hadoop cluster. Let me ask a few details to make the problem solving easier:

  1. What kind of Hadoop distribution are you using? If it's CDH, do you use it with Hive on Spark? If so, setting "hive.execution.engine" to "mr" as an Advanced Hive Parameter in your connection may solve your problem immediately. It's also possible to fix the Hive on Spark execution, but it will probably require cluster-side configuration steps.
  2. Have you executed the Full Test on your Radoop connection? If not, please do so and share the logs (in case it was unsuccessful).

 

Regards,

Zsolt

Contributor II

Re: Radoop problem in executing process

Hi Szolt,

I changed the "hive.execution.engine" to "mr" , and I received a response from Rapidminer that "The capabilites are insufficient on the data".

For the Full test on the Radoop connection, I received an error at the test number 18, when it is bout "Import job into Hive". The full zip file of the test I completed by extracting the logfile, and I have placed it in the attachment, is this alright as log ? Or is it another step I need to show the log ?

 

Regards,

Maziar

 

Attachments

Contributor II

Re: Radoop problem in executing process

Hi Maziar,

 

the logs show that the JobHistoryServer address field in your Connection has a whitespace character after "localhost". Could you re-run the Full Test with the fixed value?

 

Regards,

Zsolt

Contributor II

Re: Radoop problem in executing process

Hi Szolt,

You are absolutely right and I had a white space after the "localhost" at the JobHistory server.

I corrected that and rerun the full test, still I have the same problem at the test18 of the FullRadoop connection test, at the "Job Import".

Could you the new Log zipfile, it is attached.

And just to let you know about the Hadoop and Hive and Yarn, I have installed Hadoop and Hive myself, by downloading the binaries from Apache site, and configured it from beginning, so I am not using Cloudera, but it seems that everything I have configured is not enough, and there some parameters missing or not configured.

Regards,

Maziar

 

Attachments

Contributor II

Re: Radoop problem in executing process

Hi Maziar,

 

it seems that you've set a few special settings in your connection as Advanced Hadoop Parameters. Radoop automatically sets the commonly required Hadoop properties, so there is no need to define e.g. fs.default.name as an advanced parameter.

Are you using KMS on your cluster? If you have not configured it, the related properties are most likely not needed, you can safely turn off all KMS-related settings.

In general, I'd suggest to disable every Advanced Hadoop Property you have in your connection and re-run the Full Test.

(By the way, are you sure that your NameNode runs on port 54310? This is quite unusual.)

 

Regards,

Zsolt