Get Dispersion Information About Categorical Data

joshhazeljoshhazel Member Posts: 2 Contributor I
edited November 2018 in Help
I have attributes thats are categorical (a finite number of different values like 1st class, 2nd class, 3rd class, 4th etc.)  And I would like to figure out with Rapid Miner if it is possible to output some information about an attribute such as the count of each value (ie. there are 100  1st class,  150 2nd class, etc).

Can I do this ?  I noticed that after I run my process it has a meta data view and gives me the Most common and its count and the Least common and its count, but how do I see the rest of them?

Answers

  • SkirzynskiSkirzynski Member Posts: 164 Maven
    You can get the count of each value by using the "Aggregate" operator with the count function. Here is an example on generated data.

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.009">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.009" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_nominal_data" compatibility="5.3.009" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
            <parameter key="number_of_attributes" value="1"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.3.009" expanded="true" height="76" name="Aggregate" width="90" x="246" y="30">
            <list key="aggregation_attributes">
              <parameter key="att1" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="|att1"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
Sign In or Register to comment.