How can I count the number of the missing values in a row?

dodi55dodi55 Member Posts: 1 Newbie
I'm a beginner, and I would like to know that the data lack in a row is repairable or not. It depends on the number o the missing values in a row therefore I would like to know how can I calculate it.
Tagged:

Answers

  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Here is one simple way to do this using loops.
    <?xml version="1.0" encoding="UTF-8"?><process version="9.4.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.4.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.4.001" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="85">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="id,att1,att2,att3&#10;1,1,2,3&#10;2,4,,6&#10;3,5,,,&#10;4,,,,"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="parse_numbers" compatibility="9.4.001" expanded="true" height="82" name="Parse Numbers" width="90" x="179" y="85">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="file_path"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="single_value"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="decimal_character" value="."/>
            <parameter key="grouped_digits" value="false"/>
            <parameter key="grouping_character" value=","/>
            <parameter key="infinity_representation" value=""/>
            <parameter key="unparsable_value_handling" value="fail"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.4.001" expanded="true" height="82" name="Set Role" width="90" x="313" y="85">
            <parameter key="attribute_name" value="id"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.4.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="447" y="85">
            <list key="function_descriptions">
              <parameter key="total_missings" value="0"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="concurrency:loop_attributes" compatibility="9.4.001" expanded="true" height="82" name="Loop Attributes" width="90" x="581" y="85">
            <parameter key="attribute_filter_type" value="regular_expression"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="regular_expression" value="att.+"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="attribute_name_macro" value="loop_attribute"/>
            <parameter key="reuse_results" value="true"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="generate_attributes" compatibility="9.4.001" expanded="true" height="82" name="Generate Attributes" width="90" x="246" y="85">
                <list key="function_descriptions">
                  <parameter key="total_missings" value="if(missing(eval(%{loop_attribute})),total_missings+1,total_missings)"/>
                </list>
                <parameter key="keep_all" value="true"/>
              </operator>
              <connect from_port="input 1" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Parse Numbers" to_port="example set input"/>
          <connect from_op="Parse Numbers" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Loop Attributes" to_port="input 1"/>
          <connect from_op="Loop Attributes" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    
    But I seem to recall this coming up in a prior thread, @mschmitz was there a more elegant way of doing this?  Did you create an operator or option in the toolbox for this?  If so, I couldn't find it.

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    edited October 2019
    Hi @dodi55, @Telcontar120,

    [edit: I just realized that this is totally wrong ;-). The operator does not calculate this attribute, it is just created in the tutorial process of the operator. The tutorial process also uses loop attributes as @Telcontar120 is using. So I don't know of a better method. You may still want to have a look into the operator if you want to filter Examples depending on the number of missings per Example.]

    The "Filter Examples with Missing Values" operator from the Operator Toolbox extension does the trick. The actual goal of the operator is to remove Examples (=rows) with more than a specific number of missing attributes (this can be configured). But it also adds a new attribute with the number of missing attributes for each row. If you set the filter method to "keep all" (and maybe include special attributes, if you want to count them too), it is exactly what you are looking for.

    Best regards,
    Fabian

Sign In or Register to comment.