The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

Controlling how plots order nominal values

tennenrishintennenrishin Member Posts: 177 Contributor II
edited November 2018 in Help
Hi.

Suppose a user wants to create a "bars stacked" plot, with ordered stacking and grouping categories.
How can this be accomplished reliably? If the outputs from the following process are plotted, either the x-axis is out of order, or the stacking categories are out of order, depending on how the data is sorted. This is the minimal exampleset that induces this problem.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="generate_data_user_specification" compatibility="5.3.005" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="300">
       <list key="attribute_values">
         <parameter key="StackBy" value="&quot;red&quot;"/>
         <parameter key="GroupBy" value="&quot;b&quot;"/>
         <parameter key="Value" value="1"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="generate_data_user_specification" compatibility="5.3.005" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="45" y="210">
       <list key="attribute_values">
         <parameter key="StackBy" value="&quot;blue&quot;"/>
         <parameter key="GroupBy" value="&quot;b&quot;"/>
         <parameter key="Value" value="1"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="generate_data_user_specification" compatibility="5.3.005" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="45" y="390">
       <list key="attribute_values">
         <parameter key="StackBy" value="&quot;red&quot;"/>
         <parameter key="GroupBy" value="&quot;a&quot;"/>
         <parameter key="Value" value="1"/>
       </list>
       <list key="set_additional_roles"/>
     </operator>
     <operator activated="true" class="append" compatibility="5.3.005" expanded="true" height="112" name="Append" width="90" x="313" y="255"/>
     <operator activated="true" class="sort" compatibility="5.3.005" expanded="true" height="76" name="sort by GroupBy" width="90" x="514" y="255">
       <parameter key="attribute_name" value="GroupBy"/>
     </operator>
     <operator activated="true" class="sort" compatibility="5.3.005" expanded="true" height="76" name="sort by StackBy" width="90" x="648" y="255">
       <parameter key="attribute_name" value="StackBy"/>
     </operator>
     <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 2"/>
     <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 1"/>
     <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
     <connect from_op="Append" from_port="merged set" to_op="sort by GroupBy" to_port="example set input"/>
     <connect from_op="sort by GroupBy" from_port="example set output" to_op="sort by StackBy" to_port="example set input"/>
     <connect from_op="sort by StackBy" from_port="example set output" to_port="result 1"/>
     <connect from_op="sort by StackBy" from_port="original" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>
How can it be solved, so that both the stacking order and the x-axis order are correct?

(It is important to have consistently ordered colors/categories when multiple plots need to be compared within/across reports, for example.)

Regards
Isak

Answers

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi,

    with the plot view this is not easily possible because the values are displayed in the order they can be found in the provided example set.
    What you can do is using the Advanced Charts. There the values are sorted alphabetically.

    Best,
    Nils
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    Thanks for your response, Nils.

    It needs to go in a report, and the reporting extension doesn't do Advanced Charts AFAIK. So I'm still stuck.

    [quote author=Nils]with the plot view this is not easily possible...[/quote]
    But possible?

    Regards,
    Isak
  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi,

    with not easily possible I meant that we need to implement a new feature that allows specifying the ordering of the nominal values ;-)
    But this means that currently it is not possible to do this with the normal plot view :-(

    Best,
    Nils
  • tennenrishintennenrishin Member Posts: 177 Contributor II
    Thanks. Clearly the problem only arises when the 2D field of (x,stack) bins are sparsely populated by examples. I was thinking along the lines of, just prior to plotting, adding dummy examples (with missing values in the plotted attribute) (one example for each x group and one example for each stacking group) in such a way that after sorting, the dummy examples are guaranteed to be encountered in the correct order, before competing examples are encountered, but hopefully not influence the plot in other ways.

    Like this:
    d d d d d d d
    d       p
    d p p    p    p
    d    p p
    d          p    p

    where rows are (ordered) stacking groups and columns are (ordered) x groups, and p are populated bins and d are dummy examples.

    Do you think it could work? I'm not sure yet how to go about it generically.

    Regards,
    Isak
  • GzFGzF Member Posts: 11 Contributor II
    You could try to give each of your different nominal values a numerical value
    Then use the numbers for coloring the plot while supplying a chart on how to interprete the colors

    Try using the NominaltoNumerical.
    A second way might be to to split your dataset into subsets containing only one of the nominal values, generate the same new numerical attribute in each of them and set it accordingly.

    Cheers
    GzF

  • GzFGzF Member Posts: 11 Contributor II
    This should do the work

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification" width="90" x="45" y="120">
            <list key="attribute_values">
              <parameter key="StackBy" value="&quot;red&quot;"/>
              <parameter key="GroupBy" value="&quot;b&quot;"/>
              <parameter key="Value" value="1"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (2)" width="90" x="45" y="30">
            <list key="attribute_values">
              <parameter key="StackBy" value="&quot;blue&quot;"/>
              <parameter key="GroupBy" value="&quot;b&quot;"/>
              <parameter key="Value" value="1"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="generate_data_user_specification" compatibility="5.3.015" expanded="true" height="60" name="Generate Data by User Specification (3)" width="90" x="45" y="210">
            <list key="attribute_values">
              <parameter key="StackBy" value="&quot;red&quot;"/>
              <parameter key="GroupBy" value="&quot;a&quot;"/>
              <parameter key="Value" value="1"/>
            </list>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="append" compatibility="5.3.015" expanded="true" height="112" name="Append" width="90" x="179" y="120"/>
          <operator activated="true" class="sort" compatibility="5.3.015" expanded="true" height="76" name="sort by GroupBy" width="90" x="313" y="120">
            <parameter key="attribute_name" value="GroupBy"/>
          </operator>
          <operator activated="true" class="sort" compatibility="5.3.015" expanded="true" height="76" name="sort by StackBy" width="90" x="380" y="255">
            <parameter key="attribute_name" value="StackBy"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="5.3.015" expanded="true" height="94" name="Multiply" width="90" x="447" y="120"/>
          <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples" width="90" x="581" y="30">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="GroupBy = a"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes (2)" width="90" x="715" y="30">
            <list key="function_descriptions">
              <parameter key="GroupBy2" value="0"/>
            </list>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="5.3.015" expanded="true" height="76" name="Filter Examples (2)" width="90" x="581" y="165">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="GroupBy = b"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.3.015" expanded="true" height="76" name="Generate Attributes (3)" width="90" x="715" y="165">
            <list key="function_descriptions">
              <parameter key="GroupBy2" value="1"/>
            </list>
          </operator>
          <operator activated="true" class="union" compatibility="5.3.015" expanded="true" height="76" name="Union" width="90" x="849" y="30"/>
          <connect from_op="Generate Data by User Specification" from_port="output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Generate Data by User Specification (2)" from_port="output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Generate Data by User Specification (3)" from_port="output" to_op="Append" to_port="example set 3"/>
          <connect from_op="Append" from_port="merged set" to_op="sort by GroupBy" to_port="example set input"/>
          <connect from_op="sort by GroupBy" from_port="example set output" to_op="sort by StackBy" to_port="example set input"/>
          <connect from_op="sort by StackBy" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Filter Examples (2)" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Union" to_port="example set 1"/>
          <connect from_op="Filter Examples (2)" from_port="example set output" to_op="Generate Attributes (3)" to_port="example set input"/>
          <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Union" to_port="example set 2"/>
          <connect from_op="Union" from_port="union" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    CHeers
    GzF
Sign In or Register to comment.