RapidMiner

Store in Hadoop Hive using Custom partitioning

by RMStaff on ‎08-04-2016 01:33 PM

When working with Hadoop, RapidMiner Radoop provides a code free way to store data in hive. Several times for retrieve performance reason, especially if you are filtering data based on specific columns you can achieve a lot of gain by storing data in a different partitions .

 

RapidMiner Radoop Extension for Hadoop processing provides ability to define partition rules based on one or more columns. Rows with different values are then handled separately by hive. This article describes steps to enable partitioning during the Store in Hive step.

 

To see the option click on the "Show advanced Parameters" option in the parameters view of the store operator

2016-08-04 18_05_30-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

 Then click on the "Select Attributes" option for the partition by parameter

2016-08-04 18_06_31-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

 

 In the Pop-Up Window then move the attributes you want to partition by from the left list to the right list

 

2016-08-04 18_07_49-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

 

 You can also change the order of partitioning by moving it up or down on the right side

 

If your attribute is not visible on the left side list, you can manually type in the highlighted box below and then click on the plus icon to add it to the list

 

 

2016-08-04 18_30_25-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png