[SOLVED] Missing values and R

tennenrishintennenrishin Member Posts: 177 Contributor II
edited September 2019 in Help
Background: I would like to have a simple method of generating attributes using R.

The process below generates an attribute c=a+b, in two ways:
  • using the Generate Attributes operator
  • using the Execute Script (R) operator, with script "data$c <- data$a + data$b"
The problem is that the latter method messes with any missing values in other attributes, as you can see by comparing attribute z in the two outputs from the process. Why does this happen?
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
   <process expanded="true" height="654" width="1015">
     <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="179" y="30">
       <list key="attribute_values">
         <parameter key="a" value="1"/>
         <parameter key="b" value="2"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="generate_data_user_specification" compatibility="5.2.008" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="179" y="120">
       <list key="attribute_values">
         <parameter key="a" value="1"/>
         <parameter key="b" value="2"/>
         <parameter key="z" value="&quot;hi&quot;"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="union" compatibility="5.2.008" expanded="true" height="76" name="Union" width="90" x="380" y="30"/>
     <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="648" y="30">
       <list key="function_descriptions">
         <parameter key="c" value="a+b"/>
       </list>
     </operator>
     <operator activated="true" class="r:execute_script_r" compatibility="5.2.000" expanded="true" height="76" name="gen attribute c" width="90" x="648" y="120">
       <parameter key="script" value="data$c &lt;- data$a + data$b"/>
       <enumeration key="inputs">
         <parameter key="name_of_variable" value="data"/>
       </enumeration>
       <list key="results">
         <parameter key="data" value="Data Table"/>
       </list>
     </operator>
     <connect from_op="Generate Data by User Specification" from_port="output" to_op="Union" to_port="example set 1"/>
     <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Union" to_port="example set 2"/>
     <connect from_op="Union" from_port="union" to_op="Generate Attributes" to_port="example set input"/>
     <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
     <connect from_op="Generate Attributes" from_port="original" to_op="gen attribute c" to_port="input 1"/>
     <connect from_op="gen attribute c" from_port="output 1" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="72"/>
     <portSpacing port="sink_result 3" spacing="180"/>
   </process>
 </operator>
</process>
Tagged:

Answers

  • tennenrishintennenrishin Member Posts: 177 Contributor II
    The problem only seems to be with nominal missing values. R seems to be okay with importing and exporting numerical missing values. So I guess the solution is to replace all nominal missing values with some predefined token before calling the R script, and to DeclareMissingValues them back afterward.
Sign In or Register to comment.