RapidMiner

RapidMiner

Change storage location on Hadoop

by RMStaff on ‎08-04-2016 01:44 PM

RapidMiner Radoop allows you to do code free data prep, blending, cleansing in a distributed fashion on Hadoop. A lot of times there is a need to store this data in Hadoop after the data cleansing steps are completed, Radoop’s “Store in hive” operator is an excellent way to store data in hive generally. But sometimes there is a need control the location(directory)  of where it is stored rather than relying on Hive to do the management.

 

To see the options needed for this make sure, you can selected to show the advanced parameters for the operator.

 

Store in Hive.png

To specify custom location one can still use the “Store in Hive Operator” and specify a custom location in the box highlighted below

 

2016-08-04 18_41_26-Cortana.png

 

 

The path can be an external location on HDFS or on amazon s3. For amazon Use the  s3://<bucket>/<path> or s3n://<bucket>/<path> format to specify the destination directory (it will be created if it does not exist). Please note that in this case the target directory can not be checked or emptied beforehand, since it can not be accessed directly without AWS credentials