Feature Weighting: Filter, Wrapper and Embedded - Some Questions

MoWeiMoWei Member Posts: 18 Maven
edited September 2019 in Help
Hello everyone,

in the last few days I have looked more closely at the Feature Selection in order to compare the different possibilities with each other. Some questions came up and I would be grateful if you could help me.
In general I learned that FS methods can be sorted into "filter", "wrapper" and "embedded" methods, so I tried something from each approach. My dataset consists of about 20 numeric attributes and 10.000 examples, with the goal of identifying the most relevant features for a numeric label.

Filter Approach
I used the operators "Weight by Correlation" and "Weight by SVM", where I normalized the weights (Click on normalize weights" in the Parameters Panel) and can now compare the two weight directly, because I get values between 0 and 1. For example in Screenshot 1 are shown the results from "Weight by SVM", Left normalized and right not normalized for a later comparison.

Screenshot 1: Weight Results from "Weight by SVM" (left WITH normalization, right WITHOUT normalization)

Wrapper Approach
In this context I used the "Forward Selection", "Backward Elimination" and the "Optimize Selection (Evolutionary)" operator. Is it possible to get weights of the features between 0 AND 1, because currently only a 0 OR a 1 is output for the weighting? If I understood that correctly, it is not possible, because different subsets are tested and subset with the best performance is selected, right? (I have followed Inga Mierswa's Blog Posts when doing the wrapper approaches, see LINK)

Embedded Approach
In this context I used the weighting function of the models. For example, the "SVM" operator or the "Linear Regression" operator offers the possibility to output the weights. The first question: Is it correct that this weighting by these operators is a so-called embedded approach? Furthermore, I would like to compare the results of the weighting by the models with the weighting by the filter methods. The problem is that I have not yet found a way to normalize the model-based weightings in the same way as with the filter methods. In the filter methods I can click on "normalize weight" in the parameter window. Unfortunately this is not the case with the models. I already tried to use the "Weights to data" operator and then the "normalize" operator (Screenshot 2) with method "range transformation" and min 0 and max 1, but I don't get the right results (Screenshot 3). The weights that are not normalized are approximately the same (see screenshot 1 and 3 on the right), only the normalized weights are not (see screenshot 1 and 3 on the left). For example, the attribute 0.07_ below has a weight of 0 while above it has a weight of 0.539?

Screenshot 2: Normalize the Weights of the model "SVM"


Screenshot 3: Weight Results from "Modell SVM" (left WITH normalization, right WITHOUT normalization) There are both 18 attributes only the presentation is slightly different, please do not be confused by the height of the two screenshots)

My goal is actually to get weights of the features between 0 and 1 for all methods that I used, so that I can compare all methods concretely with each other.

Thank you very much.
Best regards
Moritz
sgenzer

Best Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,254 RM Data Scientist
    edited September 2019 Solution Accepted
    Hi @MoWei ,

    the key operator you need to know for the comparison is Weights to Data, it converts the AttributeWeights object (wei, red) into a normal example set. Afterwards you can join or append the results. Here is a way to do this:



    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="112" y="136">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.3.001" expanded="true" height="124" name="Multiply" width="90" x="246" y="136"/>
          <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.3.001" expanded="true" height="103" name="Random Forest" width="90" x="380" y="340">
            <parameter key="number_of_trees" value="100"/>
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="false"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="false"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
            <parameter key="random_splits" value="false"/>
            <parameter key="guess_subset_ratio" value="true"/>
            <parameter key="subset_ratio" value="0.2"/>
            <parameter key="voting_strategy" value="confidence vote"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data (3)" width="90" x="514" y="391"/>
          <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="648" y="391">
            <list key="function_descriptions">
              <parameter key="name" value="&quot;RF&quot;"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="weight_by_correlation" compatibility="9.3.001" expanded="true" height="82" name="Weight by Correlation" width="90" x="380" y="85">
            <parameter key="normalize_weights" value="false"/>
            <parameter key="sort_weights" value="true"/>
            <parameter key="sort_direction" value="ascending"/>
            <parameter key="squared_correlation" value="false"/>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data" width="90" x="514" y="85"/>
          <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="648" y="85">
            <list key="function_descriptions">
              <parameter key="name" value="&quot;Weight by Correlation&quot;"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="optimize_selection_backward" compatibility="9.3.001" expanded="true" height="103" name="Backward Elimination" width="90" x="380" y="187">
            <parameter key="maximal_number_of_eliminations" value="2"/>
            <parameter key="speculative_rounds" value="0"/>
            <parameter key="stopping_behavior" value="with decrease"/>
            <parameter key="use_relative_decrease" value="true"/>
            <parameter key="alpha" value="0.05"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:cross_validation" compatibility="9.3.001" expanded="true" height="145" name="Validation" width="90" x="45" y="30">
                <parameter key="split_on_batch_attribute" value="false"/>
                <parameter key="leave_one_out" value="false"/>
                <parameter key="number_of_folds" value="5"/>
                <parameter key="sampling_type" value="stratified sampling"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <parameter key="enable_parallel_execution" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.3.001" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34">
                    <parameter key="criterion" value="gain_ratio"/>
                    <parameter key="maximal_depth" value="10"/>
                    <parameter key="apply_pruning" value="true"/>
                    <parameter key="confidence" value="0.1"/>
                    <parameter key="apply_prepruning" value="true"/>
                    <parameter key="minimal_gain" value="0.01"/>
                    <parameter key="minimal_leaf_size" value="2"/>
                    <parameter key="minimal_size_for_split" value="4"/>
                    <parameter key="number_of_prepruning_alternatives" value="3"/>
                  </operator>
                  <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                  <description align="left" color="green" colored="true" height="80" resized="true" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
                    <list key="application_parameters"/>
                    <parameter key="create_view" value="false"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
                    <parameter key="use_example_weights" value="true"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
                  <connect from_op="Performance" from_port="example set" to_port="test set results"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_test set results" spacing="0"/>
                  <portSpacing port="sink_performance 1" spacing="0"/>
                  <portSpacing port="sink_performance 2" spacing="0"/>
                  <description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
                </process>
                <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a decision tree model.</description>
              </operator>
              <connect from_port="example set" to_op="Validation" to_port="example set"/>
              <connect from_op="Validation" from_port="performance 1" to_port="performance"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data (2)" width="90" x="514" y="187"/>
          <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="648" y="187">
            <list key="function_descriptions">
              <parameter key="name" value="&quot;Backwards&quot;"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="append" compatibility="9.3.001" expanded="true" height="124" name="Append" width="90" x="782" y="136">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="blending:pivot" compatibility="9.3.001" expanded="true" height="82" name="Pivot" width="90" x="916" y="136">
            <parameter key="group_by_attributes" value="Attribute"/>
            <parameter key="column_grouping_attribute" value="name"/>
            <list key="aggregation_attributes">
              <parameter key="Weight" value="average"/>
            </list>
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="default_aggregation_function" value="first"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="9.3.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="1050" y="136">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="average\((\W+)\)"/>
            <parameter key="replace_by" value="$1"/>
          </operator>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Weight by Correlation" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Backward Elimination" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 3" to_op="Random Forest" to_port="training set"/>
          <connect from_op="Random Forest" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
          <connect from_op="Weights to Data (3)" from_port="example set" to_op="Generate Attributes (3)" to_port="example set input"/>
          <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 3"/>
          <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
          <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Backward Elimination" from_port="attribute weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
          <connect from_op="Weights to Data (2)" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Append" from_port="merged set" to_op="Pivot" to_port="input"/>
          <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    You can also do this with loops, since Append can handle loop.

    On your Normalization Issue:
    The normalize flag is just dividing by the maximum value, so that the max value is one. You can just do this after converting it to a table.

    A few additonal comments:

    - Weight by SVM is effectifly a embedded method, since you use the support vectors of the SVM. It is not really a filter method.
    - For all of these - Keep Select by Weights in your mind to select the attributes you want.
    - For embedded methods: The selection of hyper-parameters may change the results significantly!
    - Please also have a look at MRMR-FS from Feature Selection extension. It is a great algorithm.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    TghadiallyMoWeisgenzernorita
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,254 RM Data Scientist
    Solution Accepted
    Hi @MoWei ,

    For the replace:
    If you use pivot you need to specify a aggregation function. If there are two values in this cell you need to define whats done. By default it is average. So you create attributes like average(xxx)_yyy
    I thought that this "average" may be confusing. So i decided to remove average( .. ). That's what the regex does.

    For the normalization:
    The trick is to put the maximum of the weights into a process variable (called macro). Afterwards you can just devide by this. See attached process.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001"><br>  <context><br>    <input/><br>    <output/><br>    <macros/><br>  </context><br>  <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process"><br>    <parameter key="logverbosity" value="init"/><br>    <parameter key="random_seed" value="2001"/><br>    <parameter key="send_mail" value="never"/><br>    <parameter key="notification_email" value=""/><br>    <parameter key="process_duration_for_mail" value="30"/><br>    <parameter key="encoding" value="SYSTEM"/><br>    <process expanded="true"><br>      <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="246" y="289"><br>        <parameter key="repository_entry" value="//Samples/data/Sonar"/><br>      </operator><br>      <operator activated="true" class="weight_by_correlation" compatibility="9.3.001" expanded="true" height="82" name="Weight by Correlation" width="90" x="514" y="289"><br>        <parameter key="normalize_weights" value="false"/><br>        <parameter key="sort_weights" value="true"/><br>        <parameter key="sort_direction" value="ascending"/><br>        <parameter key="squared_correlation" value="false"/><br>      </operator><br>      <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data" width="90" x="648" y="289"/><br>      <operator activated="true" class="extract_macro" compatibility="9.3.001" expanded="true" height="68" name="Extract Macro" width="90" x="782" y="289"><br>        <parameter key="macro" value="maxWeight"/><br>        <parameter key="macro_type" value="statistics"/><br>        <parameter key="statistics" value="max"/><br>        <parameter key="attribute_name" value="Weight"/><br>        <list key="additional_macros"/><br>      </operator><br>      <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="916" y="289"><br>        <list key="function_descriptions"><br>          <parameter key="Weight" value="Weight/eval(%{maxWeight})"/><br>        </list><br>        <parameter key="keep_all" value="true"/><br>      </operator><br>      <connect from_op="Retrieve Sonar" from_port="output" to_op="Weight by Correlation" to_port="example set"/><br>      <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/><br>      <connect from_op="Weights to Data" from_port="example set" to_op="Extract Macro" to_port="example set"/><br>      <connect from_op="Extract Macro" from_port="example set" to_op="Generate Attributes" to_port="example set input"/><br>      <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/><br>      <portSpacing port="source_input 1" spacing="0"/><br>      <portSpacing port="sink_result 1" spacing="0"/><br>      <portSpacing port="sink_result 2" spacing="0"/><br>    </process><br>  </operator><br></process><br><br>



    ~Martin


    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sgenzerTghadiallyMoWei
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,254 RM Data Scientist
    Solution Accepted
    Hi @MoWei ,

    mea culpa. Here is a version with the correct regex.

    Best,
    Martin 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001"><br>  <context><br>    <input/><br>    <output/><br>    <macros/><br>  </context><br>  <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process"><br>    <parameter key="logverbosity" value="init"/><br>    <parameter key="random_seed" value="2001"/><br>    <parameter key="send_mail" value="never"/><br>    <parameter key="notification_email" value=""/><br>    <parameter key="process_duration_for_mail" value="30"/><br>    <parameter key="encoding" value="SYSTEM"/><br>    <process expanded="true"><br>      <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="112" y="136"><br>        <parameter key="repository_entry" value="//Samples/data/Sonar"/><br>      </operator><br>      <operator activated="true" class="multiply" compatibility="9.3.001" expanded="true" height="124" name="Multiply" width="90" x="246" y="136"/><br>      <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.3.001" expanded="true" height="103" name="Random Forest" width="90" x="380" y="340"><br>        <parameter key="number_of_trees" value="100"/><br>        <parameter key="criterion" value="gain_ratio"/><br>        <parameter key="maximal_depth" value="10"/><br>        <parameter key="apply_pruning" value="false"/><br>        <parameter key="confidence" value="0.1"/><br>        <parameter key="apply_prepruning" value="false"/><br>        <parameter key="minimal_gain" value="0.01"/><br>        <parameter key="minimal_leaf_size" value="2"/><br>        <parameter key="minimal_size_for_split" value="4"/><br>        <parameter key="number_of_prepruning_alternatives" value="3"/><br>        <parameter key="random_splits" value="false"/><br>        <parameter key="guess_subset_ratio" value="true"/><br>        <parameter key="subset_ratio" value="0.2"/><br>        <parameter key="voting_strategy" value="confidence vote"/><br>        <parameter key="use_local_random_seed" value="false"/><br>        <parameter key="local_random_seed" value="1992"/><br>        <parameter key="enable_parallel_execution" value="true"/><br>      </operator><br>      <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data (3)" width="90" x="514" y="391"/><br>      <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="648" y="391"><br>        <list key="function_descriptions"><br>          <parameter key="name" value="&quot;RF&quot;"/><br>        </list><br>        <parameter key="keep_all" value="true"/><br>      </operator><br>      <operator activated="true" class="weight_by_correlation" compatibility="9.3.001" expanded="true" height="82" name="Weight by Correlation" width="90" x="380" y="85"><br>        <parameter key="normalize_weights" value="false"/><br>        <parameter key="sort_weights" value="true"/><br>        <parameter key="sort_direction" value="ascending"/><br>        <parameter key="squared_correlation" value="false"/><br>      </operator><br>      <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data" width="90" x="514" y="85"/><br>      <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="648" y="85"><br>        <list key="function_descriptions"><br>          <parameter key="name" value="&quot;Weight by Correlation&quot;"/><br>        </list><br>        <parameter key="keep_all" value="true"/><br>      </operator><br>      <operator activated="true" class="optimize_selection_backward" compatibility="9.3.001" expanded="true" height="103" name="Backward Elimination" width="90" x="380" y="187"><br>        <parameter key="maximal_number_of_eliminations" value="2"/><br>        <parameter key="speculative_rounds" value="0"/><br>        <parameter key="stopping_behavior" value="with decrease"/><br>        <parameter key="use_relative_decrease" value="true"/><br>        <parameter key="alpha" value="0.05"/><br>        <process expanded="true"><br>          <operator activated="true" class="concurrency:cross_validation" compatibility="9.3.001" expanded="true" height="145" name="Validation" width="90" x="45" y="30"><br>            <parameter key="split_on_batch_attribute" value="false"/><br>            <parameter key="leave_one_out" value="false"/><br>            <parameter key="number_of_folds" value="5"/><br>            <parameter key="sampling_type" value="stratified sampling"/><br>            <parameter key="use_local_random_seed" value="false"/><br>            <parameter key="local_random_seed" value="1992"/><br>            <parameter key="enable_parallel_execution" value="true"/><br>            <process expanded="true"><br>              <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.3.001" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34"><br>                <parameter key="criterion" value="gain_ratio"/><br>                <parameter key="maximal_depth" value="10"/><br>                <parameter key="apply_pruning" value="true"/><br>                <parameter key="confidence" value="0.1"/><br>                <parameter key="apply_prepruning" value="true"/><br>                <parameter key="minimal_gain" value="0.01"/><br>                <parameter key="minimal_leaf_size" value="2"/><br>                <parameter key="minimal_size_for_split" value="4"/><br>                <parameter key="number_of_prepruning_alternatives" value="3"/><br>              </operator><br>              <connect from_port="training set" to_op="Decision Tree" to_port="training set"/><br>              <connect from_op="Decision Tree" from_port="model" to_port="model"/><br>              <portSpacing port="source_training set" spacing="0"/><br>              <portSpacing port="sink_model" spacing="0"/><br>              <portSpacing port="sink_through 1" spacing="0"/><br>              <description align="left" color="green" colored="true" height="80" resized="true" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description><br>            </process><br>            <process expanded="true"><br>              <operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"><br>                <list key="application_parameters"/><br>                <parameter key="create_view" value="false"/><br>              </operator><br>              <operator activated="true" class="performance" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34"><br>                <parameter key="use_example_weights" value="true"/><br>              </operator><br>              <connect from_port="model" to_op="Apply Model" to_port="model"/><br>              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/><br>              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/><br>              <connect from_op="Performance" from_port="performance" to_port="performance 1"/><br>              <connect from_op="Performance" from_port="example set" to_port="test set results"/><br>              <portSpacing port="source_model" spacing="0"/><br>              <portSpacing port="source_test set" spacing="0"/><br>              <portSpacing port="source_through 1" spacing="0"/><br>              <portSpacing port="sink_test set results" spacing="0"/><br>              <portSpacing port="sink_performance 1" spacing="0"/><br>              <portSpacing port="sink_performance 2" spacing="0"/><br>              <description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description><br>            </process><br>            <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a decision tree model.</description><br>          </operator><br>          <connect from_port="example set" to_op="Validation" to_port="example set"/><br>          <connect from_op="Validation" from_port="performance 1" to_port="performance"/><br>          <portSpacing port="source_example set" spacing="0"/><br>          <portSpacing port="sink_performance" spacing="0"/><br>        </process><br>      </operator><br>      <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data (2)" width="90" x="514" y="187"/><br>      <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="648" y="187"><br>        <list key="function_descriptions"><br>          <parameter key="name" value="&quot;Backwards&quot;"/><br>        </list><br>        <parameter key="keep_all" value="true"/><br>      </operator><br>      <operator activated="true" class="append" compatibility="9.3.001" expanded="true" height="124" name="Append" width="90" x="782" y="136"><br>        <parameter key="datamanagement" value="double_array"/><br>        <parameter key="data_management" value="auto"/><br>        <parameter key="merge_type" value="all"/><br>      </operator><br>      <operator activated="true" class="blending:pivot" compatibility="9.3.001" expanded="true" height="82" name="Pivot" width="90" x="916" y="136"><br>        <parameter key="group_by_attributes" value="Attribute"/><br>        <parameter key="column_grouping_attribute" value="name"/><br>        <list key="aggregation_attributes"><br>          <parameter key="Weight" value="average"/><br>        </list><br>        <parameter key="use_default_aggregation" value="false"/><br>        <parameter key="default_aggregation_function" value="first"/><br>      </operator><br>      <operator activated="true" class="rename_by_replacing" compatibility="9.3.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="1050" y="136"><br>        <parameter key="attribute_filter_type" value="all"/><br>        <parameter key="attribute" value=""/><br>        <parameter key="attributes" value=""/><br>        <parameter key="use_except_expression" value="false"/><br>        <parameter key="value_type" value="attribute_value"/><br>        <parameter key="use_value_type_exception" value="false"/><br>        <parameter key="except_value_type" value="time"/><br>        <parameter key="block_type" value="attribute_block"/><br>        <parameter key="use_block_type_exception" value="false"/><br>        <parameter key="except_block_type" value="value_matrix_row_start"/><br>        <parameter key="invert_selection" value="false"/><br>        <parameter key="include_special_attributes" value="false"/><br>        <parameter key="replace_what" value="average\((.+)\)(.+)"/><br>        <parameter key="replace_by" value="$1$2"/><br>      </operator><br>      <connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/><br>      <connect from_op="Multiply" from_port="output 1" to_op="Weight by Correlation" to_port="example set"/><br>      <connect from_op="Multiply" from_port="output 2" to_op="Backward Elimination" to_port="example set"/><br>      <connect from_op="Multiply" from_port="output 3" to_op="Random Forest" to_port="training set"/><br>      <connect from_op="Random Forest" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/><br>      <connect from_op="Weights to Data (3)" from_port="example set" to_op="Generate Attributes (3)" to_port="example set input"/><br>      <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 3"/><br>      <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/><br>      <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/><br>      <connect from_op="Generate Attributes" from_port="example set output" to_op="Append" to_port="example set 1"/><br>      <connect from_op="Backward Elimination" from_port="attribute weights" to_op="Weights to Data (2)" to_port="attribute weights"/><br>      <connect from_op="Weights to Data (2)" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/><br>      <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/><br>      <connect from_op="Append" from_port="merged set" to_op="Pivot" to_port="input"/><br>      <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/><br>      <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/><br>      <portSpacing port="source_input 1" spacing="0"/><br>      <portSpacing port="sink_result 1" spacing="0"/><br>      <portSpacing port="sink_result 2" spacing="0"/><br>    </process><br>  </operator><br></process><br><br>


    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    MoWei

Answers

  • MoWeiMoWei Member Posts: 18 Maven
    Hey @mschmitz,
    Thank you very much for your answer. It has definitely helped me a lot.

    I still have two small questions:
    1. What do you use the "Rename by Replacing" operator for? Somehow I can't determine any function?

    2. To your note:
    On your Normalization Issue:
    The normalize flag is just dividing by the maximum value, so that the max value is one. You can just do this after converting it to a table.
    I am just too stupid to implement that :( Maybe already too late. Generally I would do this after the "Weights to data" with the operator "Generate Attributes", but unfortunately I don't know which "Functions expressions" I have to enter, because I never really worked with "Regular Expressions" before. Could you just tell me what I need to put down there? I had thought about "Weight/max(Weight)", but unfortunately this does not give the desired result. 

    Thank you very much.

    Best regards

    Moritz

  • MoWeiMoWei Member Posts: 18 Maven
    Hey @mschmitz,

    perfect, thank you very much.

    The only thing that still doesn't work (also in your XML code from your first answer) is the "Rename by Replacing" operator. The "average(Weight)_" unfortunately remains despite the use of the operator. Can it be due to the fact that the "Pivot" operator only outputs two attributes, as you can see in the screenshot (although there are several if you look at the result)? The question mark confuses me a bit at this point. Is there still a way to eliminate the "average(Weight)_" from the attibut name?



    Thank you

    Best regards

    Moritz
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,254 RM Data Scientist
    Hi,

    can you maybe quickly post the process? Did you change anything?

    Ignore the meta data after pivot please. Unfourtunatly we cannot correctly run meta data propagation after Pivot.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • MoWeiMoWei Member Posts: 18 Maven
    edited September 2019

    Hi @mschmitz,

    in the following is the XML code again, but I haven't changed anything. It's the same as the one you posted in your first answer (of course I took over the functionality for my process, but for testing I took yours and experimented with it). In the screenshot you can see the final result of your process after the "Rename by Replacing" operator. There is still "average(Weight)_" in the names.



    <?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.3.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.3.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="112" y="136">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.3.001" expanded="true" height="124" name="Multiply" width="90" x="246" y="136"/>
          <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.3.001" expanded="true" height="103" name="Random Forest" width="90" x="380" y="340">
            <parameter key="number_of_trees" value="100"/>
            <parameter key="criterion" value="gain_ratio"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="false"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="false"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
            <parameter key="random_splits" value="false"/>
            <parameter key="guess_subset_ratio" value="true"/>
            <parameter key="subset_ratio" value="0.2"/>
            <parameter key="voting_strategy" value="confidence vote"/>
            <parameter key="use_local_random_seed" value="false"/>
            <parameter key="local_random_seed" value="1992"/>
            <parameter key="enable_parallel_execution" value="true"/>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data (3)" width="90" x="514" y="391"/>
          <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (3)" width="90" x="648" y="391">
            <list key="function_descriptions">
              <parameter key="name" value="&quot;RF&quot;"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="weight_by_correlation" compatibility="9.3.001" expanded="true" height="82" name="Weight by Correlation" width="90" x="380" y="85">
            <parameter key="normalize_weights" value="false"/>
            <parameter key="sort_weights" value="true"/>
            <parameter key="sort_direction" value="ascending"/>
            <parameter key="squared_correlation" value="false"/>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data" width="90" x="514" y="85"/>
          <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="648" y="85">
            <list key="function_descriptions">
              <parameter key="name" value="&quot;Weight by Correlation&quot;"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="optimize_selection_backward" compatibility="9.3.001" expanded="true" height="103" name="Backward Elimination" width="90" x="380" y="187">
            <parameter key="maximal_number_of_eliminations" value="2"/>
            <parameter key="speculative_rounds" value="0"/>
            <parameter key="stopping_behavior" value="with decrease"/>
            <parameter key="use_relative_decrease" value="true"/>
            <parameter key="alpha" value="0.05"/>
            <process expanded="true">
              <operator activated="true" class="concurrency:cross_validation" compatibility="9.3.001" expanded="true" height="145" name="Validation" width="90" x="45" y="30">
                <parameter key="split_on_batch_attribute" value="false"/>
                <parameter key="leave_one_out" value="false"/>
                <parameter key="number_of_folds" value="5"/>
                <parameter key="sampling_type" value="stratified sampling"/>
                <parameter key="use_local_random_seed" value="false"/>
                <parameter key="local_random_seed" value="1992"/>
                <parameter key="enable_parallel_execution" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.3.001" expanded="true" height="103" name="Decision Tree" width="90" x="45" y="34">
                    <parameter key="criterion" value="gain_ratio"/>
                    <parameter key="maximal_depth" value="10"/>
                    <parameter key="apply_pruning" value="true"/>
                    <parameter key="confidence" value="0.1"/>
                    <parameter key="apply_prepruning" value="true"/>
                    <parameter key="minimal_gain" value="0.01"/>
                    <parameter key="minimal_leaf_size" value="2"/>
                    <parameter key="minimal_size_for_split" value="4"/>
                    <parameter key="number_of_prepruning_alternatives" value="3"/>
                  </operator>
                  <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
                  <connect from_op="Decision Tree" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                  <description align="left" color="green" colored="true" height="80" resized="true" width="248" x="37" y="158">In the training phase, a model is built on the current training data set. (90 % of data by default, 10 times)</description>
                </process>
                <process expanded="true">
                  <operator activated="true" class="apply_model" compatibility="9.3.001" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
                    <list key="application_parameters"/>
                    <parameter key="create_view" value="false"/>
                  </operator>
                  <operator activated="true" class="performance" compatibility="9.3.001" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
                    <parameter key="use_example_weights" value="true"/>
                  </operator>
                  <connect from_port="model" to_op="Apply Model" to_port="model"/>
                  <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
                  <connect from_op="Performance" from_port="example set" to_port="test set results"/>
                  <portSpacing port="source_model" spacing="0"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_test set results" spacing="0"/>
                  <portSpacing port="sink_performance 1" spacing="0"/>
                  <portSpacing port="sink_performance 2" spacing="0"/>
                  <description align="left" color="blue" colored="true" height="103" resized="true" width="315" x="38" y="158">The model created in the Training step is applied to the current test set (10 %).&lt;br/&gt;The performance is evaluated and sent to the operator results.</description>
                </process>
                <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a decision tree model.</description>
              </operator>
              <connect from_port="example set" to_op="Validation" to_port="example set"/>
              <connect from_op="Validation" from_port="performance 1" to_port="performance"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="weights_to_data" compatibility="9.3.001" expanded="true" height="68" name="Weights to Data (2)" width="90" x="514" y="187"/>
          <operator activated="true" class="generate_attributes" compatibility="9.3.001" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="648" y="187">
            <list key="function_descriptions">
              <parameter key="name" value="&quot;Backwards&quot;"/>
            </list>
            <parameter key="keep_all" value="true"/>
          </operator>
          <operator activated="true" class="append" compatibility="9.3.001" expanded="true" height="124" name="Append" width="90" x="782" y="136">
            <parameter key="datamanagement" value="double_array"/>
            <parameter key="data_management" value="auto"/>
            <parameter key="merge_type" value="all"/>
          </operator>
          <operator activated="true" class="blending:pivot" compatibility="9.3.001" expanded="true" height="82" name="Pivot" width="90" x="916" y="136">
            <parameter key="group_by_attributes" value="Attribute"/>
            <parameter key="column_grouping_attribute" value="name"/>
            <list key="aggregation_attributes">
              <parameter key="Weight" value="average"/>
            </list>
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="default_aggregation_function" value="first"/>
          </operator>
          <operator activated="true" class="rename_by_replacing" compatibility="9.3.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="1050" y="136">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="replace_what" value="average\((\W+)\)"/>
            <parameter key="replace_by" value="$1"/>
          </operator>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Weight by Correlation" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Backward Elimination" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 3" to_op="Random Forest" to_port="training set"/>
          <connect from_op="Random Forest" from_port="weights" to_op="Weights to Data (3)" to_port="attribute weights"/>
          <connect from_op="Weights to Data (3)" from_port="example set" to_op="Generate Attributes (3)" to_port="example set input"/>
          <connect from_op="Generate Attributes (3)" from_port="example set output" to_op="Append" to_port="example set 3"/>
          <connect from_op="Weight by Correlation" from_port="weights" to_op="Weights to Data" to_port="attribute weights"/>
          <connect from_op="Weights to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Append" to_port="example set 1"/>
          <connect from_op="Backward Elimination" from_port="attribute weights" to_op="Weights to Data (2)" to_port="attribute weights"/>
          <connect from_op="Weights to Data (2)" from_port="example set" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Append" to_port="example set 2"/>
          <connect from_op="Append" from_port="merged set" to_op="Pivot" to_port="input"/>
          <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
          <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    BR

    Moritz

  • MoWeiMoWei Member Posts: 18 Maven
    Hey @mschmitz,
    thank you very much.
    Best regards
    Moritz
    Tghadially
  • noritanorita Member Posts: 29 Contributor I
    Hi a basic question to coding and interpretation of &quot;...

    a selection of the code was :

    expanded="true" height="82" name="Generate Attributes (2)" width="90" x="648" y="187">
            <list key="function_descriptions">
              <parameter key="name" value="&quot;Backwards&quot;"/>




    my question is - how to enter "&quot;Backwards&quot;"

    by entering exactly like this it's not working:
    &quot;Backwards&quot;



Sign In or Register to comment.