Process Failed: HiveQL problem
I've installed Hadoop using Ambari Server (Hortonworks Data Platform HDP-2.6). The connection to radoop passed without any errors . I am able to store and retrieve in hive from rapidminer using the connection made. However, when I'm running any process related to Spark; For eg: the tutorial process for k-means in radoop, I get the following error :-
com.rapidminer.operator.OperatorException: HiveQL problem (org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 1:17 Invalid path ''/tmp/radoop/admin/tmp_1525790714419a_csh8jcj/'': No files matching path hdfs://node.server.com:8020/tmp/radoop/admin/tmp_1525790714419a_csh8jcj)
SEVERE: Process failed: HiveQL problem (org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: SemanticException Line 1:17 Invalid path ''/tmp/radoop/admin/tmp_1525790714419a_csh8jcj/'': No files matching path hdfs://node.server.com:8020/tmp/radoop/admin/tmp_1525790714419a_csh8jcj)
I'm also able to run programs in Spark via the terminal but I get this error if I run it through Radoop.
Can anybody get me a solution for this?
Comments
The process works fine when we run the k-means clustering(not the radoop operator) inside SparkRM and checking/setting the "merge output" and "resolve schema conflicts" .
But still the same error persists for the tutorial process.
Hi,
are you referring to the "Full Test" when saying "The connection to radoop passed without any errors"?
It would be helpful to run the Full Test (may take some time) that includes Spark job test as well.
If Full Test succeeds, there may be a hidden error in the submitted Spark job, which may result in a missing output, but that is just a guess. Do you have access to the Resource Manager UI (typically at port :8088) of the cluster.
Best,
Peter
The full test passed without any errors and I'm able to access the Resource Manager too. The process(Spark job) shows status as "SUCCEEDED". I ran into a problem with Radoop K-means Tutorial process.
In the hadoop data tab,we are able to view the clustered data but still the process ends with the same SemanticException error.
Hi,
This problem occurs when Radoop tries to load the output of the Spark job into a Hive table. My first assumption would be that the user running Hive jobs cannot access the files created by the Spark job. (owned by user 'admin') We have to find the exact reason behind this, but for that, we need to know a bit more about your configuration.
On the other hand, your sentence "In the hadoop data tab,we are able to view the clustered data " makes me doubt this assumption. Just to be sure: you can see it among the tables, and not through import files from HDFS option, right? Are you 100% sure, you see the result of the currently executed process? Because if you are, we have to look in entirely different directions.
If you disable "cleaning" option on the Radoop nest and after running the process, list the contents of the folder (belonging to the Invalid path, with user admin), do you see your data files? Or do you just see an empty folder?
1. We've not enabled any kind of security.
2. There is no authentication (hive.server2.authentication is set to NONE)
3. The "hive.server2.enable.doAs" attribute is set to "true" but "enabled" is set to 'F'(In the configRadoop_xml.txt)
We can see the clustered data among the tables.(Please see ClusteredData.png)
We tried with "cleaning" option disabled and we are able to list the contents of the folder '/tmp/radoop/admin/'(HDFSls.png) but the folder on which the HiveQL Exception occurs is empty.(RstudioLog.png)
Hi,
Thanks for your response. This seems to be a Hive bug. I need to share a few details to explain the problem. First off, if such an operation fails, Radoop simply retries it. If the second try fails, you can see an error message, but only for the second try. This is rather unfortunate if the error is something different for the first time. This can of course only happen if the first try has some unwanted side-effects, which is exactly what happened in this case. Since you didn't have a chance to see and report this original problem, I reproduced this issue in one of our test clusters. The issue seems to be caused by "hive.warehouse.subdir.inherit.perms" setting. Because of this, Hive tries to take ownership of the data files located under its warehouse directory. After issuing the load command, Hive first moves the file to the warehouse directory, then changes its ownership. This is where the problem occurs, since in your setup this operation is not permitted for user hive, since it's not the owner of the file, nor is a superuser. However, this shouldn't be a problem, because Hive docs (https://cwiki.apache.org/confluence/display/Hive/Permission+Inheritance+in+Hive) state that "Failure by Hive to inherit will not cause operation to fail." Therefore it's a bug in Hive.
However by this time the files are already moved into the warehouse directory, and can actually be queried through Hive. This is why you can see the tables in the Hadoop Data view. And the second try fails with a different error, since the files to be imported are no longer present at their original location.
Luckily there are some workarounds for this problem:
Side note: you mentioned "The "hive.server2.enable.doAs" attribute is set to "true" but "enabled" is set to 'F'(In the configRadoop_xml.txt)". Thing is, these settings don't come from Radoop, they are cluster-side configurations which are added to the connection entry during the Ambari import as disabled Advanced Settings, mainly for informative reasons. In fact, Radoop cannot even override this property dynamically. So this is the actual configuration your cluster uses, regardless of it being disabled on the Radoop connection.