RapidMiner

RapidMiner

Store in hive using custom SerDe

by RMStaff ‎08-04-2016 03:01 PM - edited ‎08-04-2016 03:05 PM

RapidMiner Radoop’s “Store in Hive” operator is a versatile operator to allow you to save data in hive or external tables.. This article describes how to enable custom storage and use a DELIMITED row format while storing.

Please ensure that the advanced parameters are enabled when you need to use DELIMITED format.

2016-08-04 19_37_45-Cortana.png

Once the custom storage option is clicked you will have addtional options, change the row format box to "Custom SerDe" as highlighted below

2016-08-04 19_54_46-RapidMiner - EY processes review and best practices - Meeting.png

 

Then provide the serde classname. Please ensure that exist in the classpath of the hive server.

Additional serde properties can be set by clicking on the "Edit List' option. These case sensitive key value pairs are passed on to the tables serde.

 

 

List of built in serde and how to write your own serde look at this link https://cwiki.apache.org/confluence/display/Hive/SerDe 

 

You can also select addtional hive file format settings or impala file format settings in the addtional options available. Please note that older hive versions may not support some of the file formats. The default hive file formats supported as of version 7.2(Aug 2016) of Radoop are TEXTFILE, RCFILE, ORC, SEQUENCEFILE, PARQUET AND custom format.

 

Additional options for inputformat and output format for when using customformat is exposed on selecting that option