Replace missing values based on another attributes

1640607mortel1640607mortel Member Posts: 1 Learner I
edited June 2019 in Help

Hey all,

I am new to Rapidminer and i have a question regarding data preparation. It looks a bit like this question, but i can't figure out how to apply this to my situation.

 

I have a dataset regarding accidents, and this datset contains the following attributes:

Total Fatal Injuries   Total Serious Injuries   Total Minor Injuries   Total Uninjured       
Missing                     Missing                             Missing                        1
Missing                     Missing                             2                                   Missing
2                                2                                         Missing                        Missing

1                                Missing                             Missing                        Missing
Missing                     Missing                             Missing                        Missing

I would like to fill in the missing values with ''0'' only if one of the four attributes contains a value.
I am not very experience with RapidMiner, and I'm learning through a book called ''Data Mining for the Masses''. Unfortunately, the book won't go into detail in these kind of problems. 

I already tried to use the Generate attributes operater with the following code, but I am not skilled enough to get it to work: 
if([ Total Uninjured ]>0, if(missing([ Total Fatal Injuries ])), then(replace(0)))

I tried to tell the program that if ''Total Uninjured'' is greater than ''0'' and ''Total Fatal Injuries'' is missing, replace ''Total Fatal Injuries'' with 0.

Any help would be greatly appreciated!

Tagged:

Answers

  • jczogallajczogalla Employee, Member Posts: 144 RM Engineering

    Hi @1640607mortel!

     

    You can filter the examples with custom filters to pick only those examples that have at least one non missing value, replace missing values on those and then append the unmatched examples (i.e. all missing). Here is an example process. I added an ID to keep track of where the originally "all missing" rows were.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="9.0.003" expanded="true" height="68" name="Generate Data" width="90" x="45" y="85">
    <parameter key="number_of_attributes" value="4"/>
    </operator>
    <operator activated="true" class="generate_id" compatibility="9.0.003" expanded="true" height="82" name="Generate ID" width="90" x="179" y="85"/>
    <operator activated="true" class="set_macro" compatibility="9.0.003" expanded="true" height="82" name="Set Macro" width="90" x="313" y="85">
    <parameter key="macro" value="range"/>
    <parameter key="value" value="8"/>
    </operator>
    <operator activated="true" class="subprocess" compatibility="9.0.003" expanded="true" height="82" name="Create Missing" width="90" x="447" y="85">
    <process expanded="true">
    <operator activated="true" class="declare_missing_value" compatibility="9.0.003" expanded="true" height="82" name="Declare Missing Value" width="90" x="45" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="att1"/>
    <parameter key="mode" value="expression"/>
    <parameter key="expression_value" value="att1 &lt; eval(%{range}) &amp;&amp; att1 &gt; -eval(%{range})"/>
    </operator>
    <operator activated="true" class="declare_missing_value" compatibility="9.0.003" expanded="true" height="82" name="Declare Missing Value (2)" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="att2"/>
    <parameter key="mode" value="expression"/>
    <parameter key="expression_value" value="att2 &lt; eval(%{range}) &amp;&amp; att2 &gt; -eval(%{range})"/>
    </operator>
    <operator activated="true" class="declare_missing_value" compatibility="9.0.003" expanded="true" height="82" name="Declare Missing Value (3)" width="90" x="313" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="att3"/>
    <parameter key="mode" value="expression"/>
    <parameter key="expression_value" value="att3 &lt; eval(%{range}) &amp;&amp; att3 &gt; -eval(%{range})"/>
    </operator>
    <operator activated="true" class="declare_missing_value" compatibility="9.0.003" expanded="true" height="82" name="Declare Missing Value (4)" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="att4"/>
    <parameter key="mode" value="expression"/>
    <parameter key="expression_value" value="att4 &lt; eval(%{range}) &amp;&amp; att4 &gt; -eval(%{range})"/>
    </operator>
    <connect from_port="in 1" to_op="Declare Missing Value" to_port="example set input"/>
    <connect from_op="Declare Missing Value" from_port="example set output" to_op="Declare Missing Value (2)" to_port="example set input"/>
    <connect from_op="Declare Missing Value (2)" from_port="example set output" to_op="Declare Missing Value (3)" to_port="example set input"/>
    <connect from_op="Declare Missing Value (3)" from_port="example set output" to_op="Declare Missing Value (4)" to_port="example set input"/>
    <connect from_op="Declare Missing Value (4)" from_port="example set output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="filter_examples" compatibility="9.0.003" expanded="true" height="103" name="Filter all missing" width="90" x="581" y="85">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="att1.is_not_missing."/>
    <parameter key="filters_entry_key" value="att2.is_not_missing."/>
    <parameter key="filters_entry_key" value="att3.is_not_missing."/>
    <parameter key="filters_entry_key" value="att4.is_not_missing."/>
    </list>
    <parameter key="filters_logic_and" value="false"/>
    </operator>
    <operator activated="true" class="replace_missing_values" compatibility="9.0.003" expanded="true" height="103" name="Replace Missing Values" width="90" x="715" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="att4|att3|att2|att1"/>
    <parameter key="default" value="zero"/>
    <list key="columns"/>
    </operator>
    <operator activated="true" class="append" compatibility="9.0.003" expanded="true" height="103" name="Append" width="90" x="916" y="85"/>
    <operator activated="true" class="sort" compatibility="9.0.003" expanded="true" height="82" name="Sort by ID" width="90" x="1050" y="85">
    <parameter key="attribute_name" value="id"/>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Set Macro" to_port="through 1"/>
    <connect from_op="Set Macro" from_port="through 1" to_op="Create Missing" to_port="in 1"/>
    <connect from_op="Create Missing" from_port="out 1" to_op="Filter all missing" to_port="example set input"/>
    <connect from_op="Filter all missing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
    <connect from_op="Filter all missing" from_port="unmatched example set" to_op="Append" to_port="example set 2"/>
    <connect from_op="Replace Missing Values" from_port="example set output" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Sort by ID" to_port="example set input"/>
    <connect from_op="Sort by ID" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    I hope that helped!
    Cheers

    Jan

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi,

     

    I'm in a computer without RM but from memory I think that the Generate Aggregation operator will do the trick: it can create a new attribute that says whether the other 4 attributes are missing or not. Then you can use it to filter.

     

    Regards,

    Sebastian

Sign In or Register to comment.