Spark job could not succeed for any supported Spark Version on Cloudera
Symptoms
Error Message while running Full-Test to connect to Cloudera Cluster from RapidMiner platform.
"The Spark job could not succeed for any supported Spark Version. It seems that the specified assembly jar or its location is incorrect: local:///opt/cloudera/parcels/CDH/lib/spark/lib/spark-assembly.jar
Diagnosis
- Verified that the spark-assembly.jar is located on all the nodes.
- Made sure there is no version mismatch between Spark version selected in Configuration Properties of Radoop Manage Connections window and Spark version of the Hadoop cluster
Solution
Cloudera's latest Spark builds (shipped with CDH 5.11 and 5.12) differ somewhat from the corresponding Apache Spark versions (they don't accept executor-cores and executor-memory options).
It is perfectly fine with using an Apache Spark release, that can be installed on HDFS with the following, or similar commands:
# do a kinit call, if Kerberos is used on the cluster
wget -O /tmp/spark-1.6.3-bin-hadoop2.6.tgz https://d3kbcqa49mib13.cloudfront.net/spark-1.6.3-bin-hadoop2.6.tgz
tar xzvf /tmp/spark-1.6.3-bin-hadoop2.6.tgz -C /tmp/
hadoop fs -mkdir -p /tmp/spark
hadoop fs -put /tmp/spark-1.6.3-bin-hadoop2.6/lib/spark-assembly-1.6.3-hadoop2.6.0.jar /tmp/spark/
In this case, the specified assembly location in the Radoop connection should be:
"hdfs:///tmp/spark/spark-assembly-1.6.3-hadoop2.6.0.jar"