Options

Historgam Question [SOLVED]

Alex_PelaezAlex_Pelaez Member Posts: 3 Contributor I
edited November 2018 in Help
I have a dataset, and am trying to create a histogram. Unfortunately, I am trying to sort the categories in the histogram. I have tried to sort the data set ,I even tried to create a new attribute. I cannot get the histogram to either show the highest frequencies on the left or the right. Can anyone provide some insight?

Alex

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Alex,

    unfortunately it is not possible to sort the axes of the plotter explicitly. The values aren't sorted alphabetically either, but by the order they have been added to the dataset. So to get a sorted histogram, you have to create a new attribute and add the values ordered by their frequency. This requires a rather large and clumsy process, but it is possible: see the attached process.

    After generating some data, it is aggregated manually by att1, the sorted by the frequencies. Then a new attribute natt1 is created, with the same values as att1. Since after sorting the example set is in the desired order, and Generate Attributes processes the data set from top to bottom, the values are added in the correct order. The problem is, that we are working on the aggregated data. You could now plot it using a scatter plot, but then you would have dots instead of bars. So we have to join it with the original data. Here it is important to do a left join, otherwise the order won't be kept.
    The last two operators remove the original att1 and rename the new attribute natt1 to att1.

    If you now create a histogram as usual on att1, it will be ordered by the frequencies.


    Yes, it is quite complicated and confusing, but it's the only possibilty that comes into my mind. Maybe there is another creative head around who finds a better solution :)

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <process expanded="true" height="251" width="1016">
          <operator activated="true" class="generate_nominal_data" compatibility="5.2.006" expanded="true" height="60" name="Generate Nominal Data" width="90" x="45" y="30">
            <parameter key="number_of_values" value="10"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.2.006" expanded="true" height="76" name="Aggregate" width="90" x="179" y="75">
            <list key="aggregation_attributes">
              <parameter key="att1" value="count"/>
            </list>
            <parameter key="group_by_attributes" value="|att1"/>
          </operator>
          <operator activated="true" class="sort" compatibility="5.2.006" expanded="true" height="76" name="Sort" width="90" x="313" y="30">
            <parameter key="attribute_name" value="count(att1)"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.006" expanded="true" height="76" name="Generate Attributes" width="90" x="447" y="30">
            <list key="function_descriptions">
              <parameter key="natt1" value="att1"/>
            </list>
          </operator>
          <operator activated="true" class="join" compatibility="5.2.006" expanded="true" height="76" name="Join" width="90" x="581" y="120">
            <parameter key="join_type" value="left"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="natt1" value="att1"/>
            </list>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.006" expanded="true" height="76" name="Select Attributes" width="90" x="715" y="120">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="att1"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="849" y="120">
            <parameter key="old_name" value="natt1"/>
            <parameter key="new_name" value="att1"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <connect from_op="Generate Nominal Data" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Sort" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="original" to_op="Join" to_port="right"/>
          <connect from_op="Sort" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    Alex_PelaezAlex_Pelaez Member Posts: 3 Contributor I
    Thank you so much for your quick response. It sounds like it will work. It is a bit clumsy, but I can easily see how this works.
Sign In or Register to comment.