How to delete rows based a list of values

RitikaRitika Member Posts: 11 Newbie
Hi! I have two datasets where the first one is a large set with a list of names and info associated with the names and the second is a smaller set containing only names. I want to delete the rows in the first set which have names not included in the second dataset. I know this is possible with the "filter examples" operator, but I do not want to manually input the filters (there are more than 100). Is there an operator that could read a file and delete the rows accordingly in another file?

Answers

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @Ritika,

    You can find in attached file an example of process which performs your task using the Set Minus operator.
    You can adapt it to your use case.

    Hope this helps,

    Regards,

    Lionel


  • RitikaRitika Member Posts: 11 Newbie

    Hello Lionel,

    I get the same malformed error. Sorry about this. Could you send the code?

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Ritika,

    Yes, sure : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.9.002">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="85">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Att1,Att2&#10;michael,1&#10;Lionel,2 &#10;Scott,3&#10;Brian,4&#10;Varun,5&#10;Jacob,6&#10;Martin,7&#10;Ingo,8&#10;Kayman,9"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="179" y="187">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Att1&#10;Lionel&#10;Ingo&#10;Brian"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role" width="90" x="447" y="187">
            <parameter key="attribute_name" value="Att1"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.9.002" expanded="true" height="103" name="Multiply (2)" width="90" x="313" y="34"/>
          <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (3)" width="90" x="648" y="34">
            <parameter key="attribute_name" value="Att1"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (2)" width="90" x="447" y="34">
            <parameter key="attribute_name" value="Att1"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus" width="90" x="581" y="187"/>
          <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus (2)" width="90" x="782" y="136"/>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Set Minus" to_port="subtrahend"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="Set Role (3)" to_port="example set input"/>
          <connect from_op="Set Role (3)" from_port="example set output" to_op="Set Minus (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Set Minus" to_port="example set input"/>
          <connect from_op="Set Minus" from_port="example set output" to_op="Set Minus (2)" to_port="subtrahend"/>
          <connect from_op="Set Minus (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,

    Lionel

  • RitikaRitika Member Posts: 11 Newbie
    Hi Lionel,

    Sorry for the late response, but yes, this worked! Is there also a way to remove instances if the table contains those values? I believe this process works for only times when the table contains those exact values. In other words, say I wanted to keep the name Mike and there are instances of Mike Anderson and Mike Brown; I would want to keep both of them regardless of the last name -- I'm just looking for values that contain Mike.
Sign In or Register to comment.