Options

Count lines

landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
edited November 2018 in Help
Quoted from sf-forum:

Hi,

Which operator can I use to count lines after applying a filter ?
Also, which operator can I use to filter for unique values in data ?

Thanks.

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    these lines you want to count are called examples within rapidMiner. The number of examples is shown in the result tab of the example set after the process has been finished. If the exampleSet will be consumed by another operator, use a breakpoint after your filter operator.

    What do you mean by filtering for unique values? The ExampleFilter with the condition_class attribute_value_filter enables you to filter examples which meet a condition like that:
    Attr1 == 4 
    . This would mean, that every example not having value 4 at attribute Attr1 is discarded!
    Other comparation operators are !=, >=, >, <, <=. Conditions might be connected logically by || (or) or && (and).


    Greetings,
      Sebastian
  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    Thanks for the prompt answer.

    I'm able to filter data but I would like to use the count result in a further operator as input. Is it possible ?
    For instance, filter1 provides a set as 15 lines, then I would like to apply another filter to build another set with a new attribute att4 = att4/15.

    About unique count, I would like to count lines by removing duplication value of a given attribute.

    Thanks for your help.
  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,

    I am not quite sure, if I have understood what you want to do. So I try to sort out your intention. Correct me, if I get it wrong. First you want to count the distinct values of each attribute, then in a second step you want to scale the values of each attribute by the reciproke of this number, i.e. the number of distinct value the attribute has. Is that correct? Or does the attribute not necessarily have to be the same in the counting of the number of distinct values and the application when scaling the attribute values? Another question: if you want to scale the attribute values, I assume that your attributes are numerical? If this is the case, i.e. that all attributes are numerical, I would say that the task you want to perform is not possible at the moment.

    However I attached a process which shows how to count occurances of a attribute values (of att1) and how to attach the number of occurances to the corresponding attribute value. Check out the process and you will understand what I mean by that. However, I think this process only works with nominal attributes ... but maybe you get an inspiration for your process design. Otherwise maybe you can clarify a little bit what exactly you are intending to do and answer my questions from above. Here is the process I promised:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="att1" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="att1"/>
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="att1"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="IOSelector (2)" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="ChangeAttributeRole (2)" class="ChangeAttributeRole">
            <parameter key="name" value="att1"/>
            <parameter key="target_role" value="id"/>
        </operator>
        <operator name="ExampleSetJoin" class="ExampleSetJoin">
        </operator>
    </operator>
    Regards,
    Tobias
  • Options
    TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi again,

    I must add, that the process I gave in the last posting does only work with the newest CVS version of RapidMiner since we recently extended the [tt]Aggregation[/tt] operator to allow the aggregation of multiple attributes as well as grouping by several attributes.

    http://rapid-i.com/content/view/25/48/lang,de/

    explains how to access the CVS version easily via Eclipse.

    Regards,
    Tobias
Sign In or Register to comment.