"Import into Hive database contains newline characters."
Hi guys,
When trying to use the Read Database within a Radoop Nest and getting the error:
"The nominal attribute 'ABCDE' has a value that contains a newline. Please replace or remove newlines."
What is the best way to deal with it? I have tried changing the file format from default to ARVO and others to see if it will get past it, but still no joy.
Is it a bug?
Best Answer
-
mborbely Member Posts: 14 Contributor II
Hi JEdward,
Please note that with this operator, and generally in Radoop, regardless of the selected storage format if you want to store an example set that hasn't been materialized in Hive, then all of its records are first written to HDFS in text format, and a temporary table is created on top of it. Then later this temporary table is used to populate the final table with the correct storage format. Therefore changing the storage format won't solve this problem.
Radoop throws this error as a preventive measure, since older Hive versions didn't handle newlines in the data properly. Fortunately this changed in newer versions https://issues.apache.org/jira/browse/HIVE-11785. However Radoop has not yet been adapted to these changes. Therefore even if you use Hive version 2.0+, your only option at the moment is replacing these newline characters manually. (E.g. in the SQL query of the Read Database operator)
Cheers,
Máté
1
Answers
Hi,
By default TEXTFILE format is used which is sensitive to newlines in Radoop. The recommended approach is to switch to PARQUET format. Did it work for you?
Attila
Darn, hopefully it will be something incorporated in the near future.
I'm actually feeding in my list of database tables as attributes and then using Read Database within a Loop Attributes to import a whole bunch of tables automatically and in parallel. I'll need to try and get creative with my query writing.