count function for nominal values

lghansselghansse Member Posts: 18 Contributor II
edited October 2019 in Help



I have a list of 25 nominal attributes, which I would like to aggregate to 1 attribute that counts the 25 said attributes if they have a valid value (being: not missing), but I'm at loss at how to do it in an easy way. I've looked at the aggregate, generate aggregate and generate attributes functions, the aggregate-functions seem only useful for integers and the generate attributes does not have a count-function (at least, not that I've found). I've included an example below for clarity. 


att1         att2         att3           att4        att5   

valuex    valuey     missing     valuez    missing

missing  valuex     missing     valuey    missing


-> So the new attribute should have value 3 for example 1, and 2 for example 2. 


Anyone has experience with this? 


Best Answer

  • Options
    Edin_KlapicEdin_Klapic Moderator, Employee, RMResearcher, Member Posts: 299 RM Data Scientist
    Solution Accepted

    Hi @lghansse,


    The Operator Generate Aggregation has a aggregation function count.

    It works as expected - I get the value 3 for example 1, and 2 for example 2. 

    See screenshot below for details.


    Best regards,








  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi Lise,


    If you have installed Python on your computer, you can use the "Execute Python" operator (to download and install via marketplace)

    to perform this task, there is only one line of code.

    Here you can find the process, with your fictive example set. 

    The calculated "count_valid_values" attribute is in the last column.

    Here the process : 

    <?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
    <operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.0.001" expanded="true" height="68" name="Read CSV" width="90" x="112" y="85">
    <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Count_Attribute.csv"/>
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    <operator activated="true" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Execute Python" width="90" x="313" y="85">
    <parameter key="script" value="import pandas as pd&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(data):&#10;&#10; #data['count_missing'] = data.shape[1] - data.count(axis=1)&#10; data['count_valid_values'] = data.count(axis=1) &#10;&#10; # connect 1 output port to see the results&#10; return data"/>
    <connect from_op="Read CSV" from_port="output" to_op="Execute Python" to_port="input 1"/>
    <connect from_op="Execute Python" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>

    Your fictive example set is in attached file.


    I hope this will be helpful






  • Options
    lghansselghansse Member Posts: 18 Contributor II

    Thank you, I tried it before but made the mistake of ticking off the checkbox "ignore missings" because I assumed it would would not count if an attribute had missing values (which would defeat my purpose). 


    Thanks for the help!



Sign In or Register to comment.