The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.

Chart combining data and model

aborgaborg Member Posts: 66 Contributor II
edited June 2019 in Help
Hi,

    I think I have an idea how models and data can be visualized together (generate a new attribute with the model and add it as one of the series), but it requires multiple steps, and setting the proper typesetting for the model values can be laborious (and it can be even unsatisfactory, as the points generating the graph might not be dense enough on all parts, sometimes the result is just a predicted (nominal) label, while we would like to see a separator line).
    I guess not all models can be visualized this way, but I guess most of them can be. When kernel methods are used I think this would be also helpful to understand some of the consequences, help to adjust the parameters, select the proper functions.
What do you think? Would this be useful? Or is it already available?
Thanks, gabor

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Gabor,

    I am not sure if I understand your proposal correctly. Do you mean something like the graphics on page 15/16 of this book?

    That is already possible (more or less): use Generate Data to create a large dataset (say 1000 examples) with the same features as your true input data. Then apply the model, and use the Advanced Charts to create a chart with the predicted label on the color dimension and the true label as shape. That will create similar graphics.

    If I did not understand your request correctly or if you have questions about what I have written, please let me know.

    Best regards,
    Marius
  • aborgaborg Member Posts: 66 Contributor II
    Hi Marius,

      Yes, I meant something like those figures. I tried to describe a similar idea as yours, although I think it is not so easy/trivial to perform:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.008">
      <context>
        <input>
          <location>//Samples/data/Ripley-Set</location>
        </input>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
        <process expanded="true" height="695" width="840">
          <operator activated="true" class="linear_regression" compatibility="5.2.008" expanded="true" height="94" name="Linear Regression" width="90" x="45" y="30">
            <parameter key="feature_selection" value="none"/>
            <parameter key="eliminate_colinear_features" value="false"/>
          </operator>
          <operator activated="true" class="generate_data" compatibility="5.2.008" expanded="true" height="60" name="Generate Data" width="90" x="180" y="30">
            <parameter key="target_function" value="grid function"/>
            <parameter key="number_examples" value="576"/>
            <parameter key="number_of_attributes" value="2"/>
            <parameter key="attributes_lower_bound" value="-1.3"/>
            <parameter key="attributes_upper_bound" value="1.0"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.2.008" expanded="true" height="76" name="Generate Attributes" width="90" x="315" y="30">
            <list key="function_descriptions">
              <parameter key="att1" value="(att1 + 10) / 20 * 2.3 - 1.3"/>
              <parameter key="att2" value="(att2 + 10) / 20 * 1.4 - 0.2"/>
            </list>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.2.008" expanded="true" height="76" name="Apply Model" width="90" x="450" y="30">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="5.2.008" expanded="true" height="76" name="Select Attributes" width="90" x="585" y="30">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="prediction(label)|att1|att2|"/>
            <parameter key="include_special_attributes" value="true"/>
          </operator>
          <operator activated="true" class="union" compatibility="5.2.008" expanded="true" height="76" name="Union" width="90" x="720" y="30"/>
          <connect from_port="input 1" to_op="Linear Regression" to_port="training set"/>
          <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Linear Regression" from_port="exampleSet" to_op="Union" to_port="example set 2"/>
          <connect from_op="Linear Regression" from_port="weights" to_port="result 1"/>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Union" to_port="example set 1"/>
          <connect from_op="Union" from_port="union" to_port="result 3"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="source_input 2" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>
        </process>
      </operator>
    </process>
    And the result is not as nice as it is in those figures.image (A line with the boundaries would be possible if the data and the model could be combined and plotted in one figure.)
    I guess this is not so important feature request, as similar thing can be done for some of the models in R for example, but it would be nice if this were possible within RapidMiner too imho.
Sign In or Register to comment.