Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

[SOLVED] Missing values and R

tennenrishintennenrishin Member Posts: 177 Contributor II
edited September 2019 in Help
Background: I would like to have a simple method of generating attributes using R.

The process below generates an attribute c=a+b, in two ways:
  • using the Generate Attributes operator
  • using the Execute Script (R) operator, with script "data$c <- data$a + data$b"
The problem is that the latter method messes with any missing values in other attributes, as you can see by comparing attribute z in the two outputs from the process. Why does this happen?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="654" width="1015">
     <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="179" y="30">
       <list key="attribute_values">
         <parameter key="a" value="1"/>
         <parameter key="b" value="2"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="179" y="120">
       <list key="attribute_values">
         <parameter key="a" value="1"/>
         <parameter key="b" value="2"/>
         <parameter key="z" value="&quot;hi&quot;"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="union" compatibility="5.2.008" expanded="true" height="76" name="Union" width="90" x="380" y="30"/>
     <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="648" y="30">
       <list key="function_descriptions">
         <parameter key="c" value="a+b"/>
       </list>
     </operator>
     <operator activated="true" class="r:execute_script_r" compatibility="5.2.000" expanded="true" height="76" name="gen attribute c" width="90" x="648" y="120">
       <parameter key="script" value="data$c &lt;- data$a + data$b"/>
       <enumeration key="inputs">
         <parameter key="name_of_variable" value="data"/>
       </enumeration>
       <list key="results">
         <parameter key="data" value="Data Table"/>
       </list>
     </operator>
     <connect from_op="Generate Data by User Specification" from_port="output" to_op="Union" to_port="example set 1"/>
     <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Union" to_port="example set 2"/>
     <connect from_op="Union" from_port="union" to_op="Generate Attributes" to_port="example set input"/>
     <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
     <connect from_op="Generate Attributes" from_port="original" to_op="gen attribute c" to_port="input 1"/>
     <connect from_op="gen attribute c" from_port="output 1" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="72"/>
     <portSpacing port="sink_result 3" spacing="180"/>
   </process>
 </operator>
</process>
Tagged:

Answers

  • tennenrishintennenrishin Member Posts: 177 Contributor II
    The problem only seems to be with nominal missing values. R seems to be okay with importing and exporting numerical missing values. So I guess the solution is to replace all nominal missing values with some predefined token before calling the R script, and to DeclareMissingValues them back afterward.
Sign In or Register to comment.