Change storage location on Hadoop

bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

RapidMiner Radoop allows you to do code free data prep, blending, cleansing in a distributed fashion on Hadoop. A lot of times there is a need to store this data in Hadoop after the data cleansing steps are completed, Radoop’s “Store in hive” operator is an excellent way to store data in hive generally. But sometimes there is a need control the location(directory)  of where it is stored rather than relying on Hive to do the management.

 

To see the options needed for this make sure, you can selected to show the advanced parameters for the operator.

 

2016-08-04 18_40_27-RapidMiner - EY processes review and best practices - Meeting.png

To specify custom location one can still use the “Store in Hive Operator” and specify a custom location in the box highlighted below

 

2016-08-04 18_41_26-Cortana.png

 

 

The path can be an external location on HDFS or on amazon s3. For amazon Use the  s3://<bucket>/<path> or s3n://<bucket>/<path> format to specify the destination directory (it will be created if it does not exist). Please note that in this case the target directory can not be checked or emptied beforehand, since it can not be accessed directly without AWS credentials

Sign In or Register to comment.