RapidMiner

RapidMiner

connecting hadoop with radoop

SOLVED
Contributor

connecting hadoop with radoop

Hi

I have just started using rapidminer (i'm a begninner) and i'm also beginner in hadoop and all these stuff..

I wanted to ask you about the steps to connect hadoop2.7.1 with rapidminer (the newest version) in ubuntu 15.10

i have already added the extension "Radoop" and it was perfectly installed

after that, when i tried to connect hadoop with radoop i had some issues regarding the following:

from where can i get the "master address name" i have already read about it but didnt know how to figure it out

http://docs.rapidminer.com/radoop/installation/configuring-radoop-connections.html

 

moreover, in my version of apache hadoop 2.7.1, i cant install spark 1.6 it is not applicable with it, the only version which is applicable is spark2.0 and i dont have the option to select it in the connection window.

while in rapidminer it should be spark 1.6

so which one should i install? and should i connect spark with hadoop or just install spark without configure it in hadoop?

http://spark.apache.org/downloads.html

 

Do i need to download hive and install it? to have a proper connection with hadoop or it is not mandatory ?

and whats the need for hiveserver2 ? are they the same hive and hiveserver?

 

Thank you so much

 

Regards,

Ebtesam

See more topics labeled with:

3 REPLIES

Re: connecting hadoop with radoop

Your master address is the ipaddress or a qualified name like server.corp.com of your master node in your cluster.

If it is a single node cluster then you cna use the ip address or the name of that node.

 

As far as spark goes, Rapidminer can work with Spark only on hadoop. So you will need to install spark.

 

What flavor of hadoop are you using?  Apache? Cloudera? Hortonworks?

If you are just trying then your easiest bet is using teh VM;s that are provided with Cloudera or Hortonwork works.

Contributor

Re: connecting hadoop with radoop


bhupendra_patil wrote:

Your master address is the ipaddress or a qualified name like server.corp.com of your master node in your cluster.

If it is a single node cluster then you cna use the ip address or the name of that node.

 

As far as spark goes, Rapidminer can work with Spark only on hadoop. So you will need to install spark.

 

What flavor of hadoop are you using?  Apache? Cloudera? Hortonworks?

If you are just trying then your easiest bet is using teh VM;s that are provided with Cloudera or Hortonwork works.


Thank you for your immediate reply

 

I'm using Apache Hadoop..

 

for spark which version should i download since its written in the "configuring radoop conncetion"

http://docs.rapidminer.com/radoop/installation/configuring-radoop-connections/

that it has to be version 1.6 or 1.5

but the applicable one for the version I have installed of hadoop(2.7.1) is spark 2.0

so which one should i download and install ?

 

for the apache hive.. i havent find hive which is applicable with java8 ..

hive versions are only applicable with java7..

so how can i install hive ?

https://cwiki.apache.org/confluence/display/Hive/GettingStarted

 

 

RMStaff

Re: connecting hadoop with radoop

Hi Ebtesam,

 

the Spark 1.6 that was built for Hadoop 2.6 will work perfectly with Hadoop 2.7.1 one as well.

You can download that to your cluster, and provide the HDFS (or local) path in the appropriate Radoop connection setting.

 

Also, Apache Hive will work on Java 8. Basically, you can expect almost anything that supports Java 7 to work on Java 8.

 

Peter