12-29-2016 10:11 AM
I'm trying to add a Radoop connection with Hortonworks Sandbox on Azure, but i'm stuck at the 10/11 test: UDF jar upload. I am a newbie in Hadoop and would appreciate a lot any help or advice.
I'm getting the following error message:
[Dec 29, 2016 4:01:29 PM]: Running test 10/11: UDF jar upload
[Dec 29, 2016 4:01:29 PM]: File uploaded: 97.04 KB written in 0 seconds (12.01 MB/sec)
[Dec 29, 2016 4:01:51 PM] SEVERE: File /tmp/radoop/_shared/db_default/radoop_hive-v4_UPLOADING_1483023689486_pisyasi.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at java.security.AccessController.doPrivileged(Native Method)
[Dec 29, 2016 4:01:52 PM] SEVERE: Test failed: UDF jar upload
Solved! Go to Solution.
01-02-2017 10:26 AM
there is another (solved) topic about connecting to Sandbox in Azure. See the solution below.
Please add this advanced Hadoop property to the connection with a true value: dfs.client.use.datanode.hostname. (In this case, the DataNode is expected to be accessed via sandbox.hortonworks.com.)
01-04-2017 06:21 AM
Thanks a lot for your reply. I tried previously the solution that you suggested, but I'm still getting stuck at the same test. I'm getting the following error:
[Jan 4, 2017 12:16:39 PM] SEVERE: DataStreamer Exception:
[Jan 4, 2017 12:16:39 PM] SEVERE: Test failed: UDF jar upload
01-04-2017 06:29 AM
if the error message is different, the property may have had an effect, but there are further errors.
The upload to HDFS was unsuccessful, the file could not be replicated to any nodes. The NameNode web interface (typically accessible at :50070 via a browser) usually shows if the DataNodes are unhealthy (Live Nodes vs Dead Nodes). For example, this is the case when the disks are full, or any problem prevents the NameNode or DataNodes to function.
01-26-2017 05:57 AM
We have updated the guide to connecting to the latest Hortonworks Sandbox virtual machine. Thoroughly following the steps should solve the above issues.
Please follow the guide at http://docs.rapidminer.com/radoop/installation/dis
For those interested in technical details, here is some explanation. The Hortonworks Sandbox connection problems appeared as Hortonworks updated their Sandbox environment, so that now Hadoop runs on Docker inside the Virtual Box. After this change in the networking, a hostname must be used to access the DataNodes, because it can be resolved to either the external or the internal IP depending on where it is resolved. Moreover, not all ports are exposed properly, that's why we need to add the permanent iptables rules as a workaround.