"Getting the names of the attributes selected in Loop Attribute Subset"

wasperenwasperen Member Posts: 16 Contributor II
edited May 2019 in Help
I am using the Loop Attribute Subset and it nicely generates a collection of all combinations of the appropriate attributes.

But I would like to create an example set that says: combining A+B gives result X, combining A+C gives result Y etc. Is there a way to obtain, in the loop, a notion of what attributes are currently looked at?

Something like %{attributes} that gives me A;B. I could then add that as an attribute to my result set...

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    this is of course possible: You could use the operator "Log" for accessing the current iteration's used attributes, the attribute count, and a performance (if available). I have uploaded a sample process to myExperiment.org:

    http://www.myexperiment.org/workflows/2211.html

    You can easily download the process with our Community Extension from myExperiment (search in the forum for more information about the extension).

    The result will be a table containing the attribute names, the attribute count, and I calculated a performance with an inner cross validation as well and stored it also in the table. Below is the result for "Golf":

    Outlook, Temperature 2.0 0.7
    Outlook, Temperature, Wind 3.0 0.7
    Outlook 1.0 0.65
    Temperature 1.0 0.65
    Outlook, Humidity 2.0 0.65
    Humidity, Wind 2.0 0.65
    Temperature, Humidity, Wind 3.0 0.65
    Wind 1.0 0.6
    Outlook, Wind 2.0 0.6
    Temperature, Humidity 2.0 0.6
    Temperature, Wind 2.0 0.55
    Outlook, Temperature, Humidity 3.0 0.55
    Outlook, Temperature, Humidity, Wind 4.0 0.55
    Humidity 1.0 0.45
    Outlook, Humidity, Wind 3.0 0.35


    Hope that helps,
    Ingo

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.008">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
        <process expanded="true" height="674" width="919">
          <operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Golf"/>
          </operator>
          <operator activated="true" class="loop_attribute_subsets" compatibility="5.1.008" expanded="true" height="60" name="Loop Subsets" width="90" x="179" y="30">
            <process expanded="true" height="674" width="919">
              <operator activated="true" class="x_validation" compatibility="5.1.008" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
                <process expanded="true" height="674" width="434">
                  <operator activated="true" class="decision_tree" compatibility="5.1.008" expanded="true" height="76" name="Decision Tree" width="90" x="45" y="30"/>
                  <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="model"/>
                  <portSpacing port="source_training" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                </process>
                <process expanded="true" height="674" width="434">
                  <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                    <list key="application_parameters"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="5.1.008" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_averagable 1" spacing="0"/>
                  <portSpacing port="sink_averagable 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="log" compatibility="5.1.008" expanded="true" height="76" name="Log" width="90" x="179" y="30">
                <list key="log">
                  <parameter key="used_attributes" value="operator.Loop Subsets.value.feature_names"/>
                  <parameter key="used_number" value="operator.Loop Subsets.value.feature_number"/>
                  <parameter key="performance" value="operator.Validation.value.performance"/>
                </list>
              </operator>
              <connect from_port="example set" to_op="Validation" to_port="training"/>
              <connect from_op="Validation" from_port="averagable 1" to_op="Log" to_port="through 1"/>
              <portSpacing port="source_example set" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Loop Subsets" to_port="example set"/>
          <connect from_op="Loop Subsets" from_port="example set" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • wasperenwasperen Member Posts: 16 Contributor II
    Hi Ingo (yes, I learn quickly),

    Thanks for this. It takes a bit of a round-tour but works for me.

    Kind regards,
    Willem
  • wasperenwasperen Member Posts: 16 Contributor II
    By the way. Using this logger in a Optimize Selection (Brute Force) does not give proper values for the feature_names value... Or so it seems. Only one shows up. Is that maybe because of the parallel execution?
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    Hi Ingo (yes, I learn quickly)
    :D thanks for the greetings...

    Using this logger in a Optimize Selection (Brute Force) does not give proper values for the feature_names value... Or so it seems. Only one shows up. Is that maybe because of the parallel execution?
    No, the reason for this is actually much simpler and lies in the way of implementation: the operators "Optimize Selection (...)" deliver only the feature names of the best individual so far since all those algorithms are based on populations (similar to evolutionary approaches). Delivering the feature names of all sets of the current population would be an option but in that case one would not know which performance belongs to which feature set. If you want to see this level of detail, the loop operator probably is the better option.

    Cheers,
    Ingo
Sign In or Register to comment.