Store in Hadoop Hive using Custom partitioning

bhupendra_patil · August 2016

When working with Hadoop, RapidMiner Radoop provides a code free way to store data in hive. Several times for retrieve performance reason, especially if you are filtering data based on specific columns you can achieve a lot of gain by storing data in a different partitions .

RapidMiner Radoop Extension for Hadoop processing provides ability to define partition rules based on one or more columns. Rows with different values are then handled separately by hive. This article describes steps to enable partitioning during the Store in Hive step.

To see the option click on the "Show advanced Parameters" option in the parameters view of the store operator

2016-08-04 18_05_30-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

Then click on the "Select Attributes" option for the partition by parameter

2016-08-04 18_06_31-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

In the Pop-Up Window then move the attributes you want to partition by from the left list to the right list

2016-08-04 18_07_49-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

You can also change the order of partitioning by moving it up or down on the right side

If your attribute is not visible on the left side list, you can manually type in the highlighted box below and then click on the plus icon to add it to the list

2016-08-04 18_30_25-_new process__ – RapidMiner Studio Developer 7.2.000 @ RMUS-BPATIL.png

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Store in Hadoop Hive using Custom partitioning