The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Radoop connection issues in v7.3

kriskris Member Posts: 3 Contributor I
edited November 2018 in Help

Hi,

 

Recently, i upgraded from rapidminer v7.2 to 7.3. After the upgrade, the radoop throws java.util.concurrent.TimeoutException while connecting to Hive server 2. In another rapidminer installation (v7.2), the same configuration works fine.

 

Current config details:

Hadoop version: Apache Hadoop 2.2+

Hadoop user name: hadoop

Hive Server2 (Hive 0.13 or newer)

default

 

hive

Spark 1.6

hdfs:///user/spark/spark-assembly.jar

 

Are there any configuration changes to be made in radoop for v7.3? I have tried with rapidminer v7.3 + radoop 7.2 as well as rapidminer v7.3 + radoop 7.3. Both of them does not work. Please help.

 

-Kris

 

Tagged:

Answers

  • phellingerphellinger Employee-RapidMiner, Member Posts: 103 RM Engineering

    Hi Kris,

     

    It would be a bit surprising, if Studio 7.2 and 7.3 behaved differently with the same Radoop version. (So it is valuable, if we find such a case. :smileywink: ) Can you reproduce this behaviour consistently?

     

    I'll copy my answer on how to move on with the problem from another topic.

     

    The error states that there were no response from the HiveServer2 instance (specified by either the Master Address or the Hive Server Address fields, and the Hive Port) in a given time.

    I would try the following:

    • Check the Hive log on the cluster. Does the SHOW TABLES command that the test sends appear in the log? (It can take seconds on first try.) That confirms that Hive may be accessible, but it may take longer time than the timeout.
    • If the log shows that the command was sent to Hive, then you can increase the timeout in Studio: go to Preferences -> Radoop, and increase the Connection timeout and Hive command timeout values to, let's say, 30. (These timeouts are used for detecting connection problems.)
    • If there is nothing in the log, then I would make sure that the specified address and port can be reached from the machine that runs Studio. If that works, I would check the health of Hive on the cluster from Beeline, for example.

     

    Best,

    Peter

  • kriskris Member Posts: 3 Contributor I

    Thanks peter for the response. 

     

    Yes. The behaviour is reproducible consistently. Yesterday, i tried creating a Amazon EMR cluster and tried connecting through Radoop. The same issue persists even if I open all inbound ports in the EMR master instance. 

     

    All URLs (namenode, history server, spark etc.) are accessible remotely. Only the hiveserver connection fails. Tried increasing the timeouts earlier upto 4minutes, but no luck. Hive works through beeline (tested this locally on the cluster).

     

     

    Let me know if there are any other tests I can try out. 

     

     

  • kriskris Member Posts: 3 Contributor I

    Peter, 

    Figured out the issue and resolved it.

     

    The problem is the change made in Rapidminer v7.3 in the system -> preferences option. Earlier under system, one has to explicitly specify HTTP proxy and by default, it's no proxy. In the new version, the proxy is a separate option (under system->preferences) and by default it's set to 'System proxy'. Once i changed it to Direct (no proxy), it worked fine. I think the default option should be no proxy. 

     

    Sharing this as it might help others who might face similar issues due to upgrade. 

     

    -Kris

Sign In or Register to comment.