Options

Apply model on many columns permutation

louismlouism Member Posts: 8 Contributor II
edited November 2018 in Help
I have 100 columns and I'd like to do linear regression between all of them. Is there an automatic way of doing that and saving the coefficients?

Thanks!

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    what exactly do you mean by "between all of them"?

    Best regards,
    Marius
  • Options
    louismlouism Member Posts: 8 Contributor II
    Hi,

    Suppose I have attributes, A, B, C…Z

    I need the linear regression coefficients for A->B, A->C… A->Z, B->A, B->C… …  Z->A to be computed. ie: each attribute must become the label and then have the linear regression coefficients be computed for each of the other attributes.

    These are to be computed as simple linear regression and not multiple linear regression.

    Given the amount of attributes, I just can't do this by hand.  :)

    Thanks for your reply!
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Ah, ok. All you need are two cascaded Loop Attribute operators, some filtering and a Branch operator to check for invalid cases :)

    Please have a look at the attached process, it does the job for you ;)

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.0.003">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.0.003" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.0.003" expanded="true" height="60" name="Retrieve Sonar" width="90" x="112" y="30">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="loop_attributes" compatibility="6.0.003" expanded="true" height="94" name="Loop Attributes" width="90" x="246" y="30">
            <parameter key="iteration_macro" value="a1"/>
            <process expanded="true">
              <operator activated="true" class="loop_attributes" compatibility="6.0.003" expanded="true" height="94" name="Loop Attributes (2)" width="90" x="45" y="30">
                <parameter key="iteration_macro" value="a2"/>
                <process expanded="true">
                  <operator activated="true" class="branch" compatibility="6.0.003" expanded="true" height="94" name="Branch" width="90" x="45" y="30">
                    <parameter key="condition_type" value="expression"/>
                    <parameter key="condition_value" value="&quot;%{a1}&quot; != &quot;%{a2}&quot;"/>
                    <process expanded="true">
                      <operator activated="true" class="select_attributes" compatibility="6.0.003" expanded="true" height="76" name="Select Attributes" width="90" x="44" y="30">
                        <parameter key="attribute_filter_type" value="regular_expression"/>
                        <parameter key="regular_expression" value="%{a1}|%{a2}"/>
                        <parameter key="include_special_attributes" value="true"/>
                      </operator>
                      <operator activated="true" class="set_role" compatibility="6.0.003" expanded="true" height="76" name="Set Role" width="90" x="178" y="30">
                        <parameter key="attribute_name" value="%{a2}"/>
                        <parameter key="target_role" value="label"/>
                        <list key="set_additional_roles"/>
                      </operator>
                      <operator activated="true" class="linear_regression" compatibility="6.0.003" expanded="true" height="94" name="Linear Regression" width="90" x="313" y="30">
                        <parameter key="feature_selection" value="none"/>
                        <parameter key="eliminate_colinear_features" value="false"/>
                      </operator>
                      <operator activated="true" class="weights_to_data" compatibility="6.0.003" expanded="true" height="60" name="Weights to Data" width="90" x="446" y="30"/>
                      <operator activated="true" class="generate_attributes" compatibility="6.0.003" expanded="true" height="76" name="Generate Attributes" width="90" x="581" y="30">
                        <list key="function_descriptions">
                          <parameter key="label" value="&quot;%{a2}&quot;"/>
                        </list>
                      </operator>
                      <connect from_port="input 1" to_op="Select Attributes" to_port="example set input"/>
                      <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
                      <connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
                      <connect from_op="Linear Regression" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
                      <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
                      <connect from_op="Generate Attributes" from_port="example set output" to_port="input 1"/>
                      <portSpacing port="source_condition" spacing="0"/>
                      <portSpacing port="source_input 1" spacing="0"/>
                      <portSpacing port="source_input 2" spacing="0"/>
                      <portSpacing port="sink_input 1" spacing="0"/>
                      <portSpacing port="sink_input 2" spacing="0"/>
                    </process>
                    <process expanded="true">
                      <portSpacing port="source_condition" spacing="0"/>
                      <portSpacing port="source_input 1" spacing="0"/>
                      <portSpacing port="source_input 2" spacing="0"/>
                      <portSpacing port="sink_input 1" spacing="0"/>
                      <portSpacing port="sink_input 2" spacing="0"/>
                    </process>
                  </operator>
                  <connect from_port="example set" to_op="Branch" to_port="input 1"/>
                  <connect from_op="Branch" from_port="input 1" to_port="result 1"/>
                  <portSpacing port="source_example set" spacing="0"/>
                  <portSpacing port="sink_example set" spacing="0"/>
                  <portSpacing port="sink_result 1" spacing="0"/>
                  <portSpacing port="sink_result 2" spacing="0"/>
                </process>
              </operator>
              <connect from_port="example set" to_op="Loop Attributes (2)" to_port="example set"/>
              <connect from_op="Loop Attributes (2)" from_port="result 1" to_port="result 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_example set" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="append" compatibility="6.0.003" expanded="true" height="76" name="Append" width="90" x="380" y="30"/>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Loop Attributes" to_port="example set"/>
          <connect from_op="Loop Attributes" from_port="result 1" to_op="Append" to_port="example set 1"/>
          <connect from_op="Append" from_port="merged set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    louismlouism Member Posts: 8 Contributor II
    LOL! Glad I asked!  It would have taken me years to figure that out. Thank you so much!

    That is just perfect because I can pull that in excel and have a ton of regressions figured out in an instant, which is exactly what I need.

    I have m, how could I extract B in the same table then? (b from:  Y = mX + b)
  • Options
    frasfras Member Posts: 93 Contributor II
    This is not as simple as in the case of weights because we have no port that delivers these values.
    You have to dive into RapidMiners APIs with the "Execute Script" operator like in the case of
    the parameters of a decision tree:
    http://rapid-i.com/rapidforum/index.php/topic,7834.0.html

  • Options
    louismlouism Member Posts: 8 Contributor II
    Thanks fras. I am not familiar with Java but I can make it out.  Where could I see how I can access the different parts (coefficient of an attribute and intercept) of the output of the linear regression?  I did not find any documentation that was that technical.  i.e.: where could I see what the equivalent of j48.measureTreeSize() and j48.measureNumLeaves() from your example (below) would be called?
    WekaClassifier classifier = (WekaClassifier)input[0];
    J48 j48 = classifier.getClassifier();
    LogService.getRoot().log(Level.INFO, "WEKA TREE SIZE: "+j48.measureTreeSize());
    LogService.getRoot().log(Level.INFO, "WEKA LEAF NUMBER: "+j48.measureNumLeaves());
Sign In or Register to comment.