The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

How to delete rows based a list of values

RitikaRitika Member Posts: 11 Newbie
Hi! I have two datasets where the first one is a large set with a list of names and info associated with the names and the second is a smaller set containing only names. I want to delete the rows in the first set which have names not included in the second dataset. I know this is possible with the "filter examples" operator, but I do not want to manually input the filters (there are more than 100). Is there an operator that could read a file and delete the rows accordingly in another file?


  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @Ritika,

    You can find in attached file an example of process which performs your task using the Set Minus operator.
    You can adapt it to your use case.

    Hope this helps,



  • RitikaRitika Member Posts: 11 Newbie

    Hello Lionel,

    I get the same malformed error. Sorry about this. Could you send the code?

  • lionelderkrikorlionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Yes, sure : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.9.002">
      <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="179" y="85">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Att1,Att2&#10;michael,1&#10;Lionel,2 &#10;Scott,3&#10;Brian,4&#10;Varun,5&#10;Jacob,6&#10;Martin,7&#10;Ingo,8&#10;Kayman,9"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet (2)" width="90" x="179" y="187">
            <parameter key="generator_type" value="comma separated text"/>
            <parameter key="number_of_examples" value="100"/>
            <parameter key="use_stepsize" value="false"/>
            <list key="function_descriptions"/>
            <parameter key="add_id_attribute" value="false"/>
            <list key="numeric_series_configuration"/>
            <list key="date_series_configuration"/>
            <list key="date_series_configuration (interval)"/>
            <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
            <parameter key="time_zone" value="SYSTEM"/>
            <parameter key="input_csv_text" value="Att1&#10;Lionel&#10;Ingo&#10;Brian"/>
            <parameter key="column_separator" value=","/>
            <parameter key="parse_all_as_nominal" value="false"/>
            <parameter key="decimal_point_character" value="."/>
            <parameter key="trim_attribute_names" value="true"/>
          <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role" width="90" x="447" y="187">
            <parameter key="attribute_name" value="Att1"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          <operator activated="true" class="multiply" compatibility="9.9.002" expanded="true" height="103" name="Multiply (2)" width="90" x="313" y="34"/>
          <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (3)" width="90" x="648" y="34">
            <parameter key="attribute_name" value="Att1"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          <operator activated="true" class="set_role" compatibility="9.9.002" expanded="true" height="82" name="Set Role (2)" width="90" x="447" y="34">
            <parameter key="attribute_name" value="Att1"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus" width="90" x="581" y="187"/>
          <operator activated="true" class="set_minus" compatibility="9.9.002" expanded="true" height="82" name="Set Minus (2)" width="90" x="782" y="136"/>
          <connect from_op="Create ExampleSet" from_port="output" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Create ExampleSet (2)" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Set Minus" to_port="subtrahend"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="Set Role (3)" to_port="example set input"/>
          <connect from_op="Set Role (3)" from_port="example set output" to_op="Set Minus (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Set Minus" to_port="example set input"/>
          <connect from_op="Set Minus" from_port="example set output" to_op="Set Minus (2)" to_port="subtrahend"/>
          <connect from_op="Set Minus (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>


  • RitikaRitika Member Posts: 11 Newbie
    Hi Lionel,

    Sorry for the late response, but yes, this worked! Is there also a way to remove instances if the table contains those values? I believe this process works for only times when the table contains those exact values. In other words, say I wanted to keep the name Mike and there are instances of Mike Anderson and Mike Brown; I would want to keep both of them regardless of the last name -- I'm just looking for values that contain Mike.
Sign In or Register to comment.