Moving data from RDBMS to Hadoop/Hive

bhupendra_patil · August 2016

Many a times there is a need to move Data from relational databases to hadoop to start leveraging the power of Hadoop.

Depending on the use case it may be a one time effort or you may need to do this periodically. Rapidminer provides way to do this very easily using the Rapidminer Studio client and Radoop extension. This article will describe setting up a RapidMiner workflow to import data from Relational data store into Hadoop.

You can download the two products from here

https://my.rapidminer.com/nexus/account/index.html#downloads

or get in touch with us at https://rapidminer.com/contact-sales-request-demo/

Use the Radoop Nest operator and drag it into a new process canvas
Configure the Radoop Connection ( The details for Setting up Radoop Connections are here http://docs.rapidminer.com/radoop/installation/configuring-radoop-connections.html)
Provide table prefix(Table prefix are used by temporary objects, Temporary objects are automatically deleted in most cases)
Double click on Radoop Nest operator
Drag the "Read Database" operator from the Extensions>>Radoop>>Data Access>>Read group
Configure the Read Database operator, it allows you to use a predefiend database connection, use jdbc url or jndi name for connection. You can build a query, use atablename or specify a sql file to define the source.
Then connect the out port of the read database to "store in hive" operator

You can use the store in hive configuration options to determinw how the data is stored, partitioned, if it should use external tables, customer storage and custom SerDe.
The Store in hive operator also allows to drop first table if it exists.
In case where you need to append to existing hive tables, use the Append to hive operator instead.

2016-08-09 12_16_48-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png /

To run the process now, you can hit the blue play button at the top

2016-08-09 12_18_19-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

You can also schedule the process to run if you have RapidMiner server installed and configured.

2016-08-09 12_19_53-__localserver_delete this_run process on server – RapidMiner Studio Developer 7..png

You can absolutely add more than one of these read- store pairs like seen below.

Sometimes there may be a need to do some data prep before it is actually stored. You can build those workflows easily with RapidMiner as seen in screen shot below

2016-08-09 13_13_07-__localserver_delete this_run process on server_ – RapidMiner Studio Developer 7.png

You can download the two products from here

https://my.rapidminer.com/nexus/account/index.html#downloads

or get in touch with us at https://rapidminer.com/contact-sales-request-demo/

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Moving data from RDBMS to Hadoop/Hive