connecting hadoop with radoop
Hi
I have just started using rapidminer (i'm a begninner) and i'm also beginner in hadoop and all these stuff..
I wanted to ask you about the steps to connect hadoop2.7.1 with rapidminer (the newest version) in ubuntu 15.10
i have already added the extension "Radoop" and it was perfectly installed
after that, when i tried to connect hadoop with radoop i had some issues regarding the following:
from where can i get the "master address name" i have already read about it but didnt know how to figure it out
http://docs.rapidminer.com/radoop/installation/configuring-radoop-connections.html
moreover, in my version of apache hadoop 2.7.1, i cant install spark 1.6 it is not applicable with it, the only version which is applicable is spark2.0 and i dont have the option to select it in the connection window.
while in rapidminer it should be spark 1.6
so which one should i install? and should i connect spark with hadoop or just install spark without configure it in hadoop?
http://spark.apache.org/downloads.html
Do i need to download hive and install it? to have a proper connection with hadoop or it is not mandatory ?
and whats the need for hiveserver2 ? are they the same hive and hiveserver?
Thank you so much
Regards,
Ebtesam
Best Answers
-
bhupendra_patil Employee-RapidMiner, Member Posts: 168 RM Data Scientist
Your master address is the ipaddress or a qualified name like server.corp.com of your master node in your cluster.
If it is a single node cluster then you cna use the ip address or the name of that node.
As far as spark goes, Rapidminer can work with Spark only on hadoop. So you will need to install spark.
What flavor of hadoop are you using? Apache? Cloudera? Hortonworks?
If you are just trying then your easiest bet is using teh VM;s that are provided with Cloudera or Hortonwork works.
1 -
phellinger Employee-RapidMiner, Member Posts: 103 RM Engineering
Hi Ebtesam,
the Spark 1.6 that was built for Hadoop 2.6 will work perfectly with Hadoop 2.7.1 one as well.
You can download that to your cluster, and provide the HDFS (or local) path in the appropriate Radoop connection setting.
Also, Apache Hive will work on Java 8. Basically, you can expect almost anything that supports Java 7 to work on Java 8.
Peter
2
Answers
Thank you for your immediate reply
I'm using Apache Hadoop..
for spark which version should i download since its written in the "configuring radoop conncetion"
http://docs.rapidminer.com/radoop/installation/configuring-radoop-connections/
that it has to be version 1.6 or 1.5
but the applicable one for the version I have installed of hadoop(2.7.1) is spark 2.0
so which one should i download and install ?
for the apache hive.. i havent find hive which is applicable with java8 ..
hive versions are only applicable with java7..
so how can i install hive ?
https://cwiki.apache.org/confluence/display/Hive/GettingStarted