Count lines

land · June 2008

Quoted from sf-forum:

Hi,

Which operator can I use to count lines after applying a filter ?
Also, which operator can I use to filter for unique values in data ?

Thanks.

land · June 2008

Hi,
these lines you want to count are called examples within rapidMiner. The number of examples is shown in the result tab of the example set after the process has been finished. If the exampleSet will be consumed by another operator, use a breakpoint after your filter operator.

What do you mean by filtering for unique values? The ExampleFilter with the condition_class attribute_value_filter enables you to filter examples which meet a condition like that:

Attr1 == 4

. This would mean, that every example not having value 4 at attribute Attr1 is discarded!
Other comparation operators are !=, >=, >, <, <=. Conditions might be connected logically by || (or) or && (and).

Greetings,
Sebastian

Legacy User · June 2008

Thanks for the prompt answer.

I'm able to filter data but I would like to use the count result in a further operator as input. Is it possible ?
For instance, filter1 provides a set as 15 lines, then I would like to apply another filter to build another set with a new attribute att4 = att4/15.

About unique count, I would like to count lines by removing duplication value of a given attribute.

Thanks for your help.

TobiasMalbrecht · June 2008

Hi,

I am not quite sure, if I have understood what you want to do. So I try to sort out your intention. Correct me, if I get it wrong. First you want to count the distinct values of each attribute, then in a second step you want to scale the values of each attribute by the reciproke of this number, i.e. the number of distinct value the attribute has. Is that correct? Or does the attribute not necessarily have to be the same in the counting of the number of distinct values and the application when scaling the attribute values? Another question: if you want to scale the attribute values, I assume that your attributes are numerical? If this is the case, i.e. that all attributes are numerical, I would say that the task you want to perform is not possible at the moment.

However I attached a process which shows how to count occurances of a attribute values (of att1) and how to attach the number of occurances to the corresponding attribute value. Check out the process and you will understand what I mean by that. However, I think this process only works with nominal attributes ... but maybe you get an inspiration for your process design. Otherwise maybe you can clarify a little bit what exactly you are intending to do and answer my questions from above. Here is the process I promised:


<operator name="Root" class="Process" expanded="yes">
    <operator name="NominalExampleSetGenerator" class="NominalExampleSetGenerator">
    </operator>
    <operator name="Aggregation" class="Aggregation">
        <list key="aggregation_attributes">
          <parameter key="att1"	value="count"/>
        </list>
        <parameter key="group_by_attributes"	value="att1"/>
    </operator>
    <operator name="IOSelector" class="IOSelector">
        <parameter key="io_object"	value="ExampleSet"/>
        <parameter key="select_which"	value="2"/>
    </operator>
    <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
        <parameter key="name"	value="att1"/>
        <parameter key="target_role"	value="id"/>
    </operator>
    <operator name="IOSelector (2)" class="IOSelector">
        <parameter key="io_object"	value="ExampleSet"/>
        <parameter key="select_which"	value="2"/>
    </operator>
    <operator name="ChangeAttributeRole (2)" class="ChangeAttributeRole">
        <parameter key="name"	value="att1"/>
        <parameter key="target_role"	value="id"/>
    </operator>
    <operator name="ExampleSetJoin" class="ExampleSetJoin">
    </operator>
</operator>

Regards,
Tobias

TobiasMalbrecht · June 2008

Hi again,

I must add, that the process I gave in the last posting does only work with the newest CVS version of RapidMiner since we recently extended the [tt]Aggregation[/tt] operator to allow the aggregation of multiple attributes as well as grouping by several attributes.

http://rapid-i.com/content/view/25/48/lang,de/

explains how to access the CVS version easily via Eclipse.

Regards,
Tobias

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Count lines

Answers