Compete in RapidMiner's 3rd Competition: Fantasy Football. Top prize is $750. Deadline December 19.
Download RapidMiner Studio or Server 8.0 Public Beta. Let us know how you like it! Ends November 27.
Watch RapidMiner's "Getting Started" videos on YouTube. Everything you need to do data science - fast and simple!
I have managed to connect Hive and Spark and Hadoop and setup Radoop connection. I am now working with Radoop Nest in an example of "Titanic" data. I have put the titanic data in Hive and want to use Radoop Validation process on the data. The running process failes with this error:
HiveQL problem Message: Error running query: java.lang.NoClassDefFoundError: scala/collection.Iterable
Where do you think is my problem?
the issue is probably related to the Hive classpath on your Hadoop cluster. Let me ask a few details to make the problem solving easier:
I changed the "hive.execution.engine" to "mr" , and I received a response from Rapidminer that "The capabilites are insufficient on the data".
For the Full test on the Radoop connection, I received an error at the test number 18, when it is bout "Import job into Hive". The full zip file of the test I completed by extracting the logfile, and I have placed it in the attachment, is this alright as log ? Or is it another step I need to show the log ?
You are absolutely right and I had a white space after the "localhost" at the JobHistory server.
I corrected that and rerun the full test, still I have the same problem at the test18 of the FullRadoop connection test, at the "Job Import".
Could you the new Log zipfile, it is attached.
And just to let you know about the Hadoop and Hive and Yarn, I have installed Hadoop and Hive myself, by downloading the binaries from Apache site, and configured it from beginning, so I am not using Cloudera, but it seems that everything I have configured is not enough, and there some parameters missing or not configured.
it seems that you've set a few special settings in your connection as Advanced Hadoop Parameters. Radoop automatically sets the commonly required Hadoop properties, so there is no need to define e.g. fs.default.name as an advanced parameter.
Are you using KMS on your cluster? If you have not configured it, the related properties are most likely not needed, you can safely turn off all KMS-related settings.
In general, I'd suggest to disable every Advanced Hadoop Property you have in your connection and re-run the Full Test.
(By the way, are you sure that your NameNode runs on port 54310? This is quite unusual.)