RapidMiner

RapidMiner

Moving data from RDBMS to Hadoop/Hive

by RMStaff on ‎08-09-2016 07:49 AM - edited on ‎09-21-2016 10:42 AM by Community Manager
  1. Many a times there is a need to move Data from relational databases to hadoop to start leveraging the power of Hadoop.

Depending on the use case it may be a one time effort or you may need to do this periodically. Rapidminer provides way to do this very easily using the Rapidminer Studio client and Radoop extension. This article will describe setting up a RapidMiner workflow to import data from Relational data store into Hadoop.

 

You can download the two products from here 

https://my.rapidminer.com/nexus/account/index.html#downloads

or get in touch with us at https://rapidminer.com/contact-sales-request-demo/

 

  • Use the Radoop Nest operator  and drag it into a new process canvasradoop nest.gif
  • Configure the Radoop Connection ( The details for Setting up Radoop Connections are here http://docs.rapidminer.com/radoop/installation/configuring-radoop-connections.html)
  • Provide table prefix(Table prefix are used by temporary objects, Temporary objects are automatically deleted in most cases)
  • Double click on Radoop Nest operator
  • Drag the "Read Database" operator from the Extensions>>Radoop>>Data Access>>Read group
  • Configure the Read Database operator, it allows you to use a predefiend database connection, use jdbc url or jndi name for connection. You can build a query, use atablename or specify a sql file to define the source.
  • Then connect the out port of the read database to "store in hive" operator
  • 2016-08-09 12_12_11-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

 

 

 

  • You can use the store in hive configuration options to determinw how the data is stored, partitioned, if it should use external tables, customer storage and custom SerDe.
  • The Store in hive operator also allows to drop first table if it exists.
  • In case where you need to append to existing hive tables, use the Append to hive operator instead.

2016-08-09 12_16_48-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png/

 

 

  • To run the process now, you can hit the blue play button at the top

play.png

 

  • You can also schedule the process to run if you have RapidMiner server installed and configured.

 

2016-08-09 12_19_53-__localserver_delete this_run process on server – RapidMiner Studio Developer 7..png

 

 

You can absolutely add more than one of these read- store pairs like seen below. 

Sometimes there may be a need to do some data prep before it is actually  stored. You can build those workflows easily with RapidMiner as seen in screen shot below

 

2016-08-09 13_13_07-__localserver_delete this_run process on server_ – RapidMiner Studio Developer 7.png

 

 

 

You can download the two products from here 

https://my.rapidminer.com/nexus/account/index.html#downloads

or get in touch with us at https://rapidminer.com/contact-sales-request-demo/