Many a times there is a need to move Data from relational databases to hadoop to start leveraging the power of Hadoop.
Depending on the use case it may be a one time effort or you may need to do this periodically. Rapidminer provides way to do this very easily using the Rapidminer Studio client and Radoop extension. This article will describe setting up a RapidMiner workflow to import data from Relational data store into Hadoop.
Provide table prefix(Table prefix are used by temporary objects, Temporary objects are automatically deleted in most cases)
Double click on Radoop Nest operator
Drag the "Read Database" operator from the Extensions>>Radoop>>Data Access>>Read group
Configure the Read Database operator, it allows you to use a predefiend database connection, use jdbc url or jndi name for connection. You can build a query, use atablename or specify a sql file to define the source.
Then connect the out port of the read database to "store in hive" operator
You can use the store in hive configuration options to determinw how the data is stored, partitioned, if it should use external tables, customer storage and custom SerDe.
The Store in hive operator also allows to drop first table if it exists.
In case where you need to append to existing hive tables, use the Append to hive operator instead.
To run the process now, you can hit the blue play button at the top
You can also schedule the process to run if you have RapidMiner server installed and configured.
You can absolutely add more than one of these read- store pairs like seen below.
Sometimes there may be a need to do some data prep before it is actually stored. You can build those workflows easily with RapidMiner as seen in screen shot below