The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

SparkRM, Hive, TEZ, Python, R, PySpark, SparkR - What is the Sequence? Or, The Radoop Matryoshka

KostasBonikosKostasBonikos Member Posts: 25 Maven
edited November 2018 in Knowledge Base

Question: If I put a Hive operator inside a SparkRM, does it become a Spark job?

No, you can only use standard RapidMiner operators inside SparkRM, you cannot use a Hive operator. However, you can configure Hive to have Spark as its execution engine. Then all the hive operators in Radoop work on Spark. There is a Hive option for that (hive.execution.engine) that you can set in the connection.

Question: If using Hortonworks and Hive with embedded TEZ, do my Hive operators automatically leverage TEZ?

As in the previous question, you just need to set the hive.execution.engine variable in the connection as “tez”.

Question: Can I execute python or R inside a Radoop nest and will it execute on the cluster?

You can use SparkR or PySpark with the “Spark Script” operator. That would be the easiest way.

If, for example, you need a package that is not in SparkR, then you can do it with SparkRM as above, but again, you need to have R installed and all in the same path.

Question: Can I run Hive operators on Spark without Hadoop?

No, we don’t integrate with Spark without Hadoop. You need a Hive server and Yarn installed. You can have Spark as Hive’s execution engine however.

Question: When writing PySpark, where should I execute the code? Radoop nest, SparkRM or Studio?

With the “Spark Script” operator, and that should be inside the Radoop nest.

Sign In or Register to comment.