Dear all,


I have recently launched an EC2 instance with a CDH 5.11 within it. All services seem to be up and running. I have passed several tests to validate the installation.


I have also installed RapidMiner Studio on my desktop as well as the Radoop extension. Currently, I am trying to connect to my hadoop cluster. The EC2 instance is not configured to use Elastic IPs, I am ussing tunnels through ssh session. 


I am currently trying to pass the full test to validate the connection. Initially, configuration was imported from Cloudera Manager. Then I modified several properties to adjust to my environment. Hive, Java version, Map Reduce, NNode networking test connections have been passed successfully but I am stucked with the upload of a jar file to HDFS. I guess the problem is given by a previous warning when doing DataNode networking test:


 WARNING: Reverse DNS lookup failed! Expected hostname for ip <public-ip>: <fqdn>, but received <public DNS>.

 WARNING: DataNode port 50010 on the ip/hostname <fqdn> cannot be reached. Please check that you can access the DataNodes of your cluster.


I guess that tunnel on port 50010 is working fine but there is something I am missing. Output of netstat command shows this port is listening to all IPs (


Things I have tried:


- Edit my local hosts file to resolve public ip to internal server hostname. Then Radoop complains because server is unreachable.

- Format namenode previously deleting all data in hdfs data directory

- Edit dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname on the client configuration to true.

- Try to upload a file using another client such as toad. Same error.

- Edit dfs.datanode.address in server to be like hostname:port is not allowed by Cloudera Manager. Only can be set as the port number.

- Edit dfs.datanode.address in the client conf does not change Radoop behaviour.


The error when trying to upload the jar file is the following:

[----] SEVERE: File /tmp/radoop/_shared/db_default/radoop_hive-v4_UPLOADING_1498636293395_dy8gaul.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.


Somehow the client knows the number of datanodes in hdfs service. Could I say ssh tunnel on port 50010 is working fine? Can someone point me to the right direction?


Thank you!!


    pau_fernandez_qpau_fernandez_q Member Posts: 2 Contributor I

    Hi phellinger,


    Thank you a lot, this was helpful. I did not read this documentation and I was trying 1 thousand tunnels.


    I am now able to pass the quick test. Full test fails in hive table load. The error tells me to check user permissions on LOAD or CREATE statements, which I have already done and seems to be ok.


    Can you point me to the right direction? 


    Thank you in advance!



    phellingerphellinger Employee, Member Posts: 103 RM Engineering

    Hi Pau,




    The Hive load test uploads an HDFS file to a temp dir, and uses the LOAD DATA Hive statement that will effectively move the file to the Hive warehouse directory.

    If you enable the Log panel in Studio (View -> Show Panel -> Log) and set the log level (right click on the panel -> Set log level -> FINER), you will see the details.


    Can you share more details (log) in PM or here?




