Options

[SOLVED] Filter Attributes

AnalyticaltimAnalyticaltim Member Posts: 15 Contributor II
edited June 2020 in Help
Dear Rapid Community,

While this question is undeniably basic I am at my wit's end of how to solve it so I turn to you.  :o

I am working with a dataset of housing sales figures in NYC. One of my Attributes is called "NEIGHBORHOODS" I want to filter specific neighborhoods out of this larger dataset for exploration. Thus, I use the "Filter Examples" operator and select "attribute_value_filter" and use the string: "NEIGHBORHOOD=FORT GREENE" (note that all original data is in Caps thus the case sensitive nature of my string). This string does not return the filtered data. Instead in the Results window I get an ExampleSet with 0 examples, 0 special attributes, 3 regular attributes.

I have checked my spelling again and again checked the data to make sure it is all there and checked all over the internet to make sure my paramater string is correct. To no avail.

There is certainly something I am missing. Any help is much appreciated.

Yours,
Tim
Tagged:

Answers

  • Options
    venkateshvenkatesh Member Posts: 15 Contributor II
    Is the attribute defined as nominal or text?
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    One of my Attributes is called "NEIGHBORHOODS"
    [...]
    and use the string: "NEIGHBORHOOD=FORT GREENE"
    You're missing an S there ;D

    On a more serious note, I just created an ExampleSet with such an attribute and tested your condition and it worked flawlessly for me (tried with attribute as nominal, polynominal and text). What version of RapidMiner are you using? Can you post your process setup here (go to the XML tab and just copy&paste)?

    Regards,
    Marco
  • Options
    AnalyticaltimAnalyticaltim Member Posts: 15 Contributor II
    Dear Marco,

    You are quite right on the "S"!  :o

    I currently have the attribute under the "NEIGHBORHOOD" as a polynominal. Could this be the problem?

    Below is the XML of my filter process.

    Thanks again for all your help RapidMiner Rocks!

    Tim


    ?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
        <process expanded="true" height="437" width="654">
          <operator activated="true" class="retrieve" compatibility="5.3.000" expanded="true" height="60" name="Retrieve Brooklyn Big with date" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Tim's Repository/Real Estate Work/Brooklyn Big with date"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.3.000" expanded="true" height="76" name="Filter Examples" width="90" x="246" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="NEIGHBORHOOD=FORT GREENE "/>
          </operator>
          <connect from_op="Retrieve Brooklyn Big with date" from_port="output" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    no it should not matter.. You have included a whitespace at the end of your condition ("FORT GREENE ") though, please make sure that's not the error.

    Apart from that I don't know. It works for me, so I'm afraid I cannot help you any further without the actual data. If you could provide a minimal sample (you can use the Filter Example Range and Select Attributes operators to only the absolute minimum needed) to me (if the data should not be publically visible you can contact me via PM) I could have a look and check if there is a bug involved.

    Regards,
    Marco
  • Options
    AnalyticaltimAnalyticaltim Member Posts: 15 Contributor II
    Dear Marco,

    Below is some origina data that I extracted with the "Filter examples range operator" within this example range the problem persists for me as well. You are correct about the "white space" in the code. I was trying that to see if it was my problem and it accidentally got in that XML I sent you. Sorry. The truncated dataset is below. Same problem with any neighborhood example in this case, Bath Beach or Carroll Gardens.

    Thanks again for your help!
    Tim

    "NEIGHBORHOOD","SALE PRICE","SALE DATE"
    "CARROLL GARDENS          ",907278.0,10/9/12 12:00 AM
    "CARROLL GARDENS          ",1522283.0,8/22/12 12:00 AM
    "CARROLL GARDENS          ",885000.0,8/22/12 12:00 AM
    "CARROLL GARDENS          ",1508642.0,8/10/12 12:00 AM
    "CARROLL GARDENS          ",830000.0,8/7/12 12:00 AM
    "CARROLL GARDENS          ",1483413.0,8/30/12 12:00 AM
    "BEDFORD STUYVESANT      ",712775.0,9/27/12 12:00 AM
    "BEDFORD STUYVESANT      ",700000.0,10/24/12 12:00 AM
    "BEDFORD STUYVESANT      ",700000.0,10/24/12 12:00 AM
    "BEDFORD STUYVESANT      ",450000.0,11/14/12 12:00 AM
    "BATH BEACH              ",0.0,11/19/12 12:00 AM
    "BATH BEACH              ",0.0,11/12/12 12:00 AM
    "BATH BEACH              ",0.0,11/13/12 12:00 AM
    "BATH BEACH              ",0.0,11/13/12 12:00 AM
    "BATH BEACH              ",0.0,12/7/12 12:00 AM
    "BATH BEACH              ",0.0,11/7/12 12:00 AM
    "BATH BEACH              ",610000.0,6/28/12 12:00 AM
    "BATH BEACH              ",0.0,5/3/12 12:00 AM
    "BATH BEACH              ",0.0,3/26/12 12:00 AM
    "BATH BEACH              ",508000.0,8/24/12 12:00 AM
    "BATH BEACH              ",690000.0,11/14/12 12:00 AM
    "BATH BEACH              ",0.0,2/6/12 12:00 AM
    "BATH BEACH              ",800000.0,2/6/12 12:00 AM
    "BATH BEACH              ",420000.0,4/4/12 12:00 AM
    "BATH BEACH              ",500000.0,7/19/12 12:00 AM
  • Options
    Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    there we go. You're having trouble because of the whitespaces at the end of each NEIGHBORHOOD name. Sadly due to some restrictions the parameter you entered will get trimmed, aka will have its leading and trailing whitespaces removed, therefore it won't work. What you can do is remove the whitespaces for your NEIGHBORHOOD attribute, and you can do so via the Generate Attributes operator. Just add it after you retrieve your data and before the Filter Examples operator. Then add a key/value pair to the function descriptions parameter as follows:

    attribute name: NEIGHBORHOOD_NEW
    function expressions: trim(NEIGHBORHOOD)
    You can then filter on the NEIGHBORHOOD_NEW attribute and will finally get your desired results :)
    We plan to enhance the Filter Examples operator in the future, but until then I'm afraid the workaround is necessary in this case.

    Regards,
    Marco
  • Options
    AnalyticaltimAnalyticaltim Member Posts: 15 Contributor II
    Marco!

    My man! It worked like a dream! Thank you very much. You are tops!  ;D :D

    Thanks again for all your help.
    RapidMiner Rocks!

    Tim
Sign In or Register to comment.