Remove empty rows

ccapraccapra Member Posts: 6 Contributor II
edited November 2018 in Help
Newbie here & I'm sure this is a dumb question: :-[

I have a survey & it doesn't necessarily get filled in every time someone opens it, so there are rows with a response ID but nothing else. How can I tell rapidminer to get rid of those empty rows?

Answers

  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Filter Examples is one you could use

    Regards

    Andrew
  • ccapraccapra Member Posts: 6 Contributor II
    Thanks Andrew, this pointed me in the right direction (& I checked out your blog too, which I'll read more of, it looks helpful).

    And I'm still not fully clear. I want to filter out all rows where the attribute 'Completed' is empty, I used attribute_value_filter & parameter string  Completed=?  - and that filters out all rows (even the ones where the completion date has data.)

    What am I doing wrong?  

    I don't know if this helps, but here is the xml for the process I'm trying to create

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.008" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.008" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\Users\Christine\Documents\Career\Consulting Business\Projects\Social Innovation Lab\Surveys\May Survey\Lime Exports\results-survey856587.xls"/>
            <parameter key="imported_cell_range" value="A1:Z38"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="date_format" value="EEE, MMM d, ''yy"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="id.true.integer.attribute"/>
              <parameter key="1" value="Completed.true.polynominal.attribute"/>
              <parameter key="2" value="Last page.true.integer.attribute"/>
              <parameter key="3" value="Start language.true.binominal.attribute"/>
              <parameter key="4" value="Date started.true.polynominal.attribute"/>
              <parameter key="5" value="Date last action.true.polynominal.attribute"/>
              <parameter key="6" value="IP address.true.polynominal.attribute"/>
              <parameter key="7" value="Referring URL.true.binominal.attribute"/>
              <parameter key="8" value="This was the fourth gathering of the Social Innovation Lab broader community\. This gathering had 3 goals\. How valuable was the May 2013 gathering for you in moving toward each of these goals- [A\. Learn new tools and ways of seeing things that can help advance social innovation].true.polynominal.attribute"/>
              <parameter key="9" value="This was the fourth gathering of the Social Innovation Lab broader community\. This gathering had 3 goals\. How valuable was the May 2013 gathering for you in moving toward each of these goals- [B\. Advance innovative projects that are working to enable sustainable and inclusive communities in Minnesota by applying new learning to featured projects, as well as to your own work\.].true.polynominal.attribute"/>
              <parameter key="10" value="This was the fourth gathering of the Social Innovation Lab broader community\. This gathering had 3 goals\. How valuable was the May 2013 gathering for you in moving toward each of these goals- [C\. Connect with other change makers, from many fields and perspectives, to share insights and build collaborative relationships\.].true.polynominal.attribute"/>
              <parameter key="11" value="Please rate the following aspects of the May Social Innovations Lab on a scale of 1-5 with 1 being Not valuable to me at all and 5 being Extremely valuable to me\. [Exploring the question of what fosters transformation].true.polynominal.attribute"/>
              <parameter key="12" value="Please rate the following aspects of the May Social Innovations Lab on a scale of 1-5 with 1 being Not valuable to me at all and 5 being Extremely valuable to me\. [Hearing stories of real transformation experienced in our community].true.polynominal.attribute"/>
              <parameter key="13" value="Please rate the following aspects of the May Social Innovations Lab on a scale of 1-5 with 1 being Not valuable to me at all and 5 being Extremely valuable to me\. [Networking and connecting opportunities].true.polynominal.attribute"/>
              <parameter key="14" value="Which prior events have you participated in- [May 2012 Lab - Human Systems Dynamics focus].true.binominal.attribute"/>
              <parameter key="15" value="Which prior events have you participated in- [September 2012 Lab - Network Weaving focus].true.binominal.attribute"/>
              <parameter key="16" value="Which prior events have you participated in- [December 2012 Lab - Vulnerability as a Resource for Leadership focus].true.binominal.attribute"/>
              <parameter key="17" value="Which prior events have you participated in- [March 2013 - Sustainable Food System in North Minneapolis].true.binominal.attribute"/>
              <parameter key="18" value="Which prior events have you participated in- [One or more Social Innovation Lab ADVISORY GROUP meetings].true.binominal.attribute"/>
              <parameter key="19" value="Based on your experience with the Social Innovation Lab to date, which of the following statements most closely resembles your own current sentiment\..true.polynominal.attribute"/>
              <parameter key="20" value="What will you do next, or differently, as a result of this Lab-.true.polynominal.attribute"/>
              <parameter key="21" value="What else would you like to see emerge from this Social Innovation Lab experiment- How else could we help our community foster transformation, in addition to hosting lab events like this one and mentoring others to host their own labs- .true.polynominal.attribute"/>
              <parameter key="22" value="Comments and suggestions about activities from our gathering\. Your comments and suggestions are very helpful in forming these activities and this community\..true.polynominal.attribute"/>
              <parameter key="23" value="Each Social Innovation Lab will feature a different framework and tool for pursuing social innovation\. Do you have suggestions for tools or frameworks to be featured in future labs-.true.polynominal.attribute"/>
              <parameter key="24" value="As someone who has a history with this emerging community, which statement below comes closest to your own current sentiment\..true.attribute_value.attribute"/>
              <parameter key="25" value="As someone who has a history with this emerging community, which statement below comes closest to your own current sentiment\. - comment.true.attribute_value.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.3.008" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
            <parameter key="attribute_name" value="id"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.3.008" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="Completed=?"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • earmijoearmijo Member Posts: 270 Unicorn
    Say you have a CSV file like this

    y,x1,x2,x3
    12,1,1,3
    13,4,5,6
    7,,,
    ,8,9,10

    where y is the label and x1-x3 are the attributes,  the 3rd observation has no attributes and the 4th has no label. If you want to filter the 3rd observation you use Filter Examples with condition class = "no missing attributes". That should do the trick.
  • ccapraccapra Member Posts: 6 Contributor II
    Thanks earmijo! 

    Tried that & it also filters all rows. 

    Each 'empty' row does have a response ID, but all other columns are empty. Is it the ID that's causing the problem?
  • earmijoearmijo Member Posts: 270 Unicorn
    It should work (it works in my little example). The problem must be in your Excel file. If the data are not confidential upload the xls file.

    How are the missing values coded in Excel? NA()?  Try replacing them by blanks.
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Another approach is to use the Generate Attributes operator to create a new attribute based on a value of one or more others.

    So you could create a new attribute called "dodgy" using if(Completed=="a bad value", 1, 0)

    Then use Filter Examples where dodgy=1 to remove the ones you don't need

    Andrew
  • ccapraccapra Member Posts: 6 Contributor II
    Thanks both Earmijo & Andrew.
    In the excel sheet, the cells are empty - I'd load it, but can't see how.

    Andrew, I'll try your approach - assuming I can figure out how you mean "a bad value"  :)
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello ccapra

    "bad value" is whatever you want in the context of your data.

    regards

    Andrew
  • ccapraccapra Member Posts: 6 Contributor II
    So -  I figured out somehow that the 'completed' field had dates in a format that RapidMiner wasn't reading & after trying all date formats & not finding one that worked, I decided to use another attribute that also had null values. What finally worked was to use the 'last page' field, which was either 2 or null - but the only way it worked was to set the data type as 'attribute_value' (not binominal or numerical or integer).

    So - what does that mean? (it works, but I don't get it) 

    Anyway, thanks for the help!
Sign In or Register to comment.