"Radoop Error: could not upload the necessary components to the directory of HDFS"

User13 · May 2016

Hi,

I am having problem with this Radoop Error: could not upload the necessary components to the directory of HDFS. It said that the radoop can't upload into the directory ''/tmp/radoop/27eca174758add21906d3b197af684e7/ ' .

So I changed the permission of "'/tmp/radoop/" and also '/tmp/radoop/27eca174758add21906d3b197af684e7/ ' of the namenode in VM, and then I typed in 'hadoop fs -ls /tmp/radoop", results show that the permission had been changed. So I went ahead and re-run the process which has the "Radoop nest", and the same error pop out again, and the permission of the directory "tmp/radoop" is now changed back automatically to the one before.

Could some give me some pointers please ? I am using rapidminer studio with radoop extension to connect to a dummy single node hadoop cluster in local VM.

FYI, i was able to connect the cluster, and also able to explore all the HIVE tables.

Thanks heaps !

User13 · May 2016

Hi,

This error almost always indicate a problem with the connection setup. To inspect that, please run the so-called "Full Test" on the Manage Radoop Connections dialog for your connection entry. This can be done by enabling the Full Test checkbox below the Test button on this dialog after opening the connection entry, then hitting on the Test button. This will run a series of tests and will try all types of communication to the cluster - including a flat file upload to the HDFS. It is a good practice to always run that after a defining a new connection.

Upload to the HDFS requires a remote access to all DataNodes from the machine that runs RapidMiner Radoop. The HDFS must be configured to be accessible remotely. That means, that in case of a single-node cluster, the NameNode must use a hostname and IP that can be accessed remotely and DNS and reverse DNS must work for these addresses. So, for example, neither localhost nor 127.0.0.1 will work remotely, and an IP address may only work if reverse DNS translates it into the appropriate hostname (and may not work for all distributions).

If the upload test fails, please navigate to the NameNode web interface - usually :50070 - and check if the NameNode is configured to a : address that is accessible remotely. Ping command should be able to resolve it to the IP address. Please also check on this page that the Safemode is off. Please note that the HDFS address is determined during the Hadoop cluster install and usually cannot be changed later without a cluster reinstall.

I hope that this helps.

For any further problems: please describe which test fails (Connection Test Log) and what distribution do you use and what are the connection settings.

Best,
Peter

User13 · May 2016

Thanks Peter,

Great help.

1. I think my HDFS was configured to be accessible remotely because I was able to test connected to it (not full test) and explore the tables in Hive.

2. The full test throw me an exception, below please find the connection log. I am using the quickstart.cloudera as the single node hadoop, could this the reason (datanode# > 1?) that the exception was thrown ?org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/radoop/27eca174758add21906d3b197af684e7/import_data_integration/00050 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

Really appreciate your help Peter !

User13 · May 2016

Hi Peter,

Connection log as shown below:
=========================
[Mar 25, 2015 2:21:49 PM]: --------------------------------------------------
[Mar 25, 2015 2:21:49 PM]: Connection test for 'quickstart.cloudera' started.
[Mar 25, 2015 2:21:49 PM]: Hive server 2 connection (192.168.1.119:10000) test started.
[Mar 25, 2015 2:21:49 PM]: Hive server 2 connection test succeeded.
[Mar 25, 2015 2:21:49 PM]: Retrieving required configuration properties...
[Mar 25, 2015 2:21:50 PM]: Successfully fetched property: yarn.resourcemanager.scheduler.address
[Mar 25, 2015 2:21:50 PM]: Successfully fetched property: yarn.resourcemanager.resource-tracker.address
[Mar 25, 2015 2:21:50 PM]: Successfully fetched property: yarn.resourcemanager.admin.address
[Mar 25, 2015 2:21:50 PM]: Successfully fetched property: yarn.application.classpath
[Mar 25, 2015 2:21:50 PM]: Successfully fetched property: mapreduce.jobhistory.address
[Mar 25, 2015 2:21:50 PM]: Distributed file system test started.
[Mar 25, 2015 2:21:50 PM]: Distributed file system test succeeded.
[Mar 25, 2015 2:21:50 PM]: MapReduce test started.
[Mar 25, 2015 2:21:50 PM]: MapReduce test succeeded.
[Mar 25, 2015 2:21:50 PM]: Radoop temporary directory test started.
[Mar 25, 2015 2:21:50 PM]: Radoop temporary directory test succeeded.
[Mar 25, 2015 2:21:50 PM]: MapReduce staging directory test started.
[Mar 25, 2015 2:21:50 PM]: MapReduce staging directory test succeeded.
[Mar 25, 2015 2:21:50 PM]: Connection test for 'quickstart.cloudera' completed successfully.
[Mar 25, 2015 2:21:50 PM]: --------------------------------------------------
[Mar 25, 2015 2:21:50 PM]: Integration test for 'quickstart.cloudera' started.
[Mar 25, 2015 2:21:50 PM]: The test may require several minutes to complete.
[Mar 25, 2015 2:21:50 PM]: Distributed file system upload started.
[Mar 25, 2015 2:21:50 PM]: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/radoop/27eca174758add21906d3b197af684e7/import_data_integration/00050 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1504)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3065)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at .........
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:361)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1437)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1260)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:525)

[Mar 25, 2015 2:21:50 PM] SEVERE: File /tmp/radoop/27eca174758add21906d3b197af684e7/import_data_integration/00050 could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1504)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3065)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:615)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:188)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:476)
at ..........
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

[Mar 25, 2015 2:21:50 PM] SEVERE: Test data upload to the distributed file system failed. Please check that the NameNode and the DataNodes run and are accessible.

User13 · May 2016

Hi,

The quickstart VM is unfortunately not remotely accessible this way.
(You can check this by running "cat /etc/hosts" on the VM. The single entry is for 127.0.0.1.)
Hive is accessible remotely, but other services are not.

However, I recommend you to install RapidMiner and Radoop into the VM. It should run on the VM and connect to the local cluster without any problem.

I hope that this helps.

Best,
Peter

User13 · May 2016

Thanks Peter,

That's great help,

Just to confirm, I will install rapidminer server and then RapidMiner Radoop on RapidMiner Server, is that right ?

Another question is, if I setup a multi nodes Hadoop cluster from scratch without using cloudera quickstart, I will be able to access the VM remotely as the /et/hosts file would have the multi entries ?

Really appreciate your help ! Pete

User13 · May 2016

Hi,

No, I suggest to install RapidMiner Studio on the Cloudera VM and use it through the VM desktop environment. Of course, in this case your local files (like CSV files) must be either put into a shared directory or uploaded to the VM if you want to use them.

If you install RapidMiner Server on the Cloudera VM, that will also work, but in that case you can only run the processes on the Server and not from the Studio on you host machine.

Radoop should work with a multi-node cluster, just please pay attention to the networking requirements. See, for example:
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_ig_cm_requirements.html#cmig_topic_4_3_3_unique_1

Best,
Peter

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Radoop Error: could not upload the necessary components to the directory of HDFS"

Answers