Custom storage handlers on Hadoop when using Radoop "Store in Hive"
When using RapidMiner Radoop "Store in Hive" operator there may be a need to use some custom storage handlers.
Storage handlers make it possibe to allow Hive to access data stored and managed by other systems.
RapidMiner’s “Store in Hive” operator provides a lot of flexibility when it comes to saving the data in hive or external tables in HDFS of Amazon S3.
Additionally custom storage handles may allow you to use Hypertable, Cassandra, JDBC, MongoDB, Google Spreadsheets as documented here
To enable custom Storage ensure you have the advanced parameters visible like below.
Now click on the “Custom Storage” checkbox to explore options for using custom storage handlers
Once you click on the "custom storage" option, additional options are made available as below .
When providing the custom storage handle you need to ensure that it must exist in the CLASSPATH of the hive server.
The user defined SerDe properties can be then added by clicking the “Edit List” button.
Please note that the SerDe properties are case sensitive
Download Rapidminer Radoop for free today from http://bit.ly/RadoopDL