The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

How does MultipleLabelIterator work? How to apply resulting model(s)?

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2018 in Help

The MultipleLabelIterator applies its inner operator multiple times, changing the label each time. When used with a learning function, the learner is thus trained on each of the multiple labels.

Is the result a single model than can somehow be used to attached multiple labels to new data? If so, how? Or are we required to save each inner-loop model, then reload them to use them on new data?

--Gary

Answers

  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    To clarify the second question, I see we could use macros to write out the individual models, then use the MultipleLabelIterator to read in and apply each model. But how do end up with an ExampleSet or multiple ExampleSets that include _all_ of the added label predictions and confidences?

    If called from code, how do you 'reach into' the MultipleLabelIterator operator to retrieve each example set prediction/confidence as it applies the labels?

    Thanks,
    Gary
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Gary,
    how do you 'reach into' the MultipleLabelIterator operator to retrieve each example set prediction/confidence as it applies the labels?
    I'm doing multiple labels against a fixed set of attributes, so I guess we are encountering similar problems. My answer to your question is to experiment with the process log and its Logging side-kicks. Every time the model is applied that application can be logged, and finally the log can be transformed to an example set. Any use?

    Here's some scrap code to illustrate the approach.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ParameterIteration" class="ParameterIteration" expanded="yes">
            <list key="parameters">
              <parameter key="Label Symbol.value" value="&#39;$MNX.X&#39;,&#39;$RUI.X&#39;,&#39;$RUT.X&#39;,&#39;$RUA.X&#39;,&#39;$OEX.X&#39;,&#39;$SPX.X&#39;,&#39;$BIX.X&#39;,&#39;$CEX.X&#39;,&#39;$HCX.X&#39;,&#39;$IUX.X&#39;,&#39;$RLX.X&#39;,&#39;$SML.X&#39;,&#39;$TNX.X&#39;,&#39;$IRX.X&#39;,&#39;$TYX.X&#39;,&#39;$FVX.X&#39;,&#39;$GOX.X&#39;,&#39;$INX.X&#39;,&#39;$OIX.X&#39;,&#39;$TXX.X&#39;,&#39;$VIX.X&#39;,&#39;$DJR.X&#39;,&#39;$DJX.X&#39;,&#39;$ECM.X&#39;,&#39;$DTX.X&#39;,&#39;$DUX.X&#39;,&#39;AUDJPY&#39;,&#39;AUDUSD&#39;,&#39;CHFJPY&#39;,&#39;EURAUD&#39;,&#39;EURCHF&#39;,&#39;EURGBP&#39;,&#39;EURJPY&#39;,&#39;EURUSD&#39;,&#39;GBPCHF&#39;,&#39;GBPJPY&#39;,&#39;GBPUSD&#39;,&#39;USDCAD&#39;,&#39;USDCHF&#39;,&#39;USDJPY&#39;"/>
            </list>
            <operator name="MemoryCleanUp (5)" class="MemoryCleanUp">
            </operator>
            <operator name="Label Symbol" class="SingleMacroDefinition">
                <parameter key="macro" value="Label"/>
                <parameter key="value" value="&#39;$MNX.X&#39;"/>
            </operator>
            <operator name="For 1 to Horizon (2)" class="IteratingOperatorChain" expanded="yes">
                <parameter key="iterations" value="%{Horizon}"/>
                <operator name="Set this Horizon (2)" class="MacroConstruction">
                    <list key="function_descriptions">
                      <parameter key="TimesL" value="if(mod(%{a},%{Horizon})==0,%{Horizon},mod(%{a},%{Horizon}))"/>
                    </list>
                </operator>
                <operator name="Set ThisH" class="SingleMacroDefinition">
                    <parameter key="macro" value="ThisHorizon"/>
                    <parameter key="value" value="%{TimesL}"/>
                </operator>
                <operator name="IORetriever (3)" class="IORetriever">
                    <parameter key="name" value="MultiLabelSet"/>
                    <parameter key="io_object" value="ExampleSet"/>
                    <parameter key="remove_from_store" value="false"/>
                </operator>
                <operator name="MaterializeDataInMemory" class="MaterializeDataInMemory">
                </operator>
                <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
                    <parameter key="name" value="%{Label}_plus_%{TimesL}"/>
                    <parameter key="target_role" value="label"/>
                </operator>
                <operator name="FeatureNameFilter" class="FeatureNameFilter">
                    <parameter key="skip_features_with_name" value=".*plus.*"/>
                </operator>
                <operator name="EqualLabelWeighting" class="EqualLabelWeighting">
                </operator>
                <operator name="ExampleFilter" class="ExampleFilter">
                    <parameter key="condition_class" value="no_missing_attributes"/>
                </operator>
                <operator name="Normalization" class="Normalization">
                    <parameter key="return_preprocessing_model" value="true"/>
                </operator>
                <operator name="Set learning parameters" class="OperatorChain" expanded="yes">
                    <operator name="Load learning parameters" class="ParameterSetLoader">
                        <parameter key="parameter_file" value="%{Label}_%{TimesL}.par"/>
                    </operator>
                    <operator name="Apply learning parameters" class="ParameterSetter">
                        <list key="name_map">
                          <parameter key="NNValidation" value="NNValidation (2)"/>
                          <parameter key="LIbSVMLearner" value="LibSVMLearner (2)"/>
                        </list>
                    </operator>
                </operator>
                <operator name="Test learning" class="SlidingWindowValidation" expanded="yes">
                    <parameter key="training_window_width" value="61"/>
                    <parameter key="training_window_step_size" value="1"/>
                    <parameter key="test_window_width" value="1"/>
                    <parameter key="horizon" value="%{Horizon}"/>
                    <operator name="Create Model" class="OperatorChain" expanded="yes">
                        <operator name="NearestNeighbors" class="NearestNeighbors">
                        </operator>
                    </operator>
                    <operator name="Test Model" class="OperatorChain" expanded="yes">
                        <operator name="Make predictions" class="ModelApplier">
                            <list key="application_parameters">
                            </list>
                        </operator>
                        <operator name="Store Actual against Predicted" class="OperatorChain" expanded="yes">
                            <operator name="Timestamp" class="Data2Log">
                                <parameter key="attribute_name" value="DD"/>
                                <parameter key="example_index" value="-1"/>
                            </operator>
                            <operator name="Actual" class="Data2Log">
                                <parameter key="attribute_name" value="%{Label}_plus_%{TimesL}"/>
                                <parameter key="example_index" value="-1"/>
                            </operator>
                            <operator name="Predicted" class="Data2Log">
                                <parameter key="attribute_name" value="prediction(%{Label}_plus_%{TimesL})"/>
                                <parameter key="example_index" value="-1"/>
                            </operator>
                            <operator name="Log Predictions" class="ProcessLog">
                                <list key="log">
                                  <parameter key="Date" value="operator.Timestamp.value.data_value"/>
                                  <parameter key="Symbol" value="operator.Label Symbol.value.macro_value"/>
                                  <parameter key="Horizon" value="operator.Set ThisH.value.macro_value"/>
                                  <parameter key="Actual" value="operator.Actual.value.data_value"/>
                                  <parameter key="Predicted" value="operator.Predicted.value.data_value"/>
                                </list>
                            </operator>
                        </operator>
                        <operator name="Compare to actual" class="ClassificationPerformance">
                            <parameter key="accuracy" value="true"/>
                            <list key="class_weights">
                            </list>
                        </operator>
                    </operator>
                </operator>
                <operator name="Write predictions and store performance" class="OperatorChain" expanded="yes">
                    <operator name="Convert Predictions log to examples" class="ProcessLog2ExampleSet">
                        <parameter key="log_name" value="Log Predictions"/>
                    </operator>
                    <operator name="Write Predictions to database" class="DatabaseExampleSetWriter">
                        <parameter key="database_system" value="Microsoft SQL Server (Microsoft)"/>
                        <parameter key="database_url" value="%{TargetURL}"/>
                        <parameter key="username" value="%{UserName}"/>
                        <parameter key="password" value="WR7+PADZ9jX9l2SCcYmCSmo0kmGJrM/OymvF4EHeL+4="/>
                        <parameter key="table_name" value="Predictions"/>
                        <parameter key="overwrite_mode" value="overwrite first, append then"/>
                        <parameter key="set_default_varchar_length" value="true"/>
                        <parameter key="default_varchar_length" value="20"/>
                    </operator>
                    <operator name="Log Performances" class="ProcessLog">
                        <list key="log">
                          <parameter key="Symbol" value="operator.Label Symbol.value.macro_value"/>
                          <parameter key="Horizon" value="operator.Set ThisH.value.macro_value"/>
                          <parameter key="Performance" value="operator.Test learning.value.performance"/>
                        </list>
                    </operator>
                </operator>
                <operator name="Clear up" class="OperatorChain" expanded="yes">
                    <operator name="MemoryCleanUp (4)" class="MemoryCleanUp">
                    </operator>
                    <operator name="ClearProcessLog" class="ClearProcessLog">
                        <parameter key="log_name" value="Log Predictions"/>
                    </operator>
                </operator>
                <operator name="IOConsumer (2)" class="IOConsumer">
                    <parameter key="io_object" value="PerformanceVector"/>
                </operator>
                <operator name="IOConsumer (3)" class="IOConsumer">
                    <parameter key="io_object" value="ParameterSet"/>
                </operator>
            </operator>
        </operator>
    </operator>
  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    Thanks, haddock, for that excellent example! It's answering several of my questions.

    Further questions for haddock and others:

    I notice that you used an IteratingOperator instead of the MultipleLabelIterator. In doing so, you create the changing attribute names with macros. In contrast, the MultipleLabelIterator changes the attribute name automatically. Is there a way to retrieve the attribute name used by the MultipleLabelIterator for the label? In particular, how can we get the name of the prediction attribute? Your example constructs the name. Seems like there should be a way to get the name from the attribute with the type 'label'.

    I also see that you store the results in the database as they are generated. Is there a way to accumulate them, adding prediction/confidence columns onto the dataset? So the end result would look like the dataset used to train with MultipleLabelIterator. Alternatively, I see that the MultipleLabelIterator is designed to accumulate result items; how do we make the predictions into a result set that MultipleLabelIterator can retain?

    Thanks again, haddock.
    Gary
  • Options
    haddockhaddock Member Posts: 849 Maven
    Is there a way to retrieve the attribute name used by the MultipleLabelIterator for the label? In particular, how can we get the name of the prediction attribute?
    When a model is applied a new attribute is created in the form "prediction(X)" where X is the Label. You'll need to replace the brackets with a regex rename if you want to use that in constructing something nicer.
    Alternatively, I see that the MultipleLabelIterator is designed to accumulate result items; how do we make the predictions into a result set that MultipleLabelIterator can retain?
    Take a look at the data2log operators in my example, even though you're on a multiple label setup you should be able to bludgeon the predictions into a log in the same way.
  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie
    When a model is applied a new attribute is created in the form "prediction(X)" where X is the Label. You'll need to replace the brackets with a regex rename if you want to use that in constructing something nicer.
    The problem is that the Data2Log operators don't accept regular expressions for the attribute names... As you suggest, I could use a ChangeAttributeNameReplace which does accept a regex attribute name. But if I change 'prediction(X)' to a fixed name, then I lose the name of the class I'm predicting ('X'). If I use the 'replace_by' group (ie, $1), then I still have the same problem: a dynamic attribute name that can't be passed to Data2Log.

    Is there a way to retain the prediction column that's added onto the dataset by ModelApplier inside the MultipleLabelIterator? Could we keep the whole ExampleSet, so that at the end of the loop, all of the extra columns remain?
  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie

    Hey, found a solution!  :)

    It turns out that IOStorer will cause the example sets to accumulate. So I told the Performance-testing operator to keep the exampleSet, then filtered out all the non-special numerical attributes. That left the ids, the labels, and the predictions/confidences. So I filtered out the special attributes of form 'label_*'. That left just the ids and predictions/confidences. Adding an IOStorer caused these stripped-down example sets to be retained to the very end, along with the final Averaged performance. I didn't even have to use any IORetrievers.

            <operator name="Performance-testing" class="Performance">
                <parameter key="keep_example_set" value="true"/>
            </operator>
            <operator name="Remove non-special attributes" class="AttributeFilter">
                <parameter key="condition_class" value="is_numerical"/>
                <parameter key="invert_filter" value="true"/>
            </operator>
            <operator name="Remove label_ attributes" class="AttributeFilter">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="parameter_string" value="label_.*"/>
                <parameter key="invert_filter" value="true"/>
                <parameter key="apply_on_special" value="true"/>
            </operator>
            <operator name="IOStorer" class="IOStorer">
                <parameter key="name" value="predictions_%{a}"/>
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="remove_from_process" value="false"/>
            </operator>
    The final result is an example set of ids and predictions for each label in the multilabel data.

    Please let me know if there's a simpler or preferred way to do this.

    --Gary
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi Gary,

    Actually I think that all the work is done by the feature filters, like this..
    <operator name="Root" class="Process" expanded="yes">
        <operator name="MultipleLabelGenerator" class="MultipleLabelGenerator">
        </operator>
        <operator name="MultipleLabelIterator" class="MultipleLabelIterator" expanded="yes">
            <operator name="LibSVMLearner (2)" class="LibSVMLearner">
                <parameter key="keep_example_set" value="true"/>
                <list key="class_weights">
                </list>
            </operator>
            <operator name="ModelApplier (2)" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="FeatureNameFilter" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="label.*|att.*"/>
            </operator>
        </operator>
    </operator>
    You still end up with separate example sets because the name of the label is not extracted, and I'm not sure it can be when the MultiLabelIterator is used. However, all is not lost because you can transpose the examples to expose the label names, and then iterate over them instead, like this....
    <operator name="Root" class="Process" expanded="yes">
        <operator name="MultipleLabelGenerator" class="MultipleLabelGenerator">
        </operator>
        <operator name="IOStorer" class="IOStorer">
            <parameter key="name" value="raw"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="remove_from_process" value="false"/>
        </operator>
        <operator name="ExampleSetTranspose" class="ExampleSetTranspose">
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="id!=label.*"/>
            <parameter key="invert_filter" value="true"/>
        </operator>
        <operator name="ValueIterator" class="ValueIterator" expanded="yes">
            <parameter key="attribute" value="id"/>
            <operator name="IORetriever" class="IORetriever">
                <parameter key="name" value="raw"/>
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="remove_from_store" value="false"/>
            </operator>
            <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
                <parameter key="name" value="%{loop_value}"/>
                <parameter key="target_role" value="label"/>
            </operator>
            <operator name="FeatureNameFilter" class="FeatureNameFilter">
                <parameter key="filter_special_features" value="true"/>
                <parameter key="skip_features_with_name" value="label.*"/>
                <parameter key="except_features_with_name" value="%{loop_value}"/>
            </operator>
            <operator name="LibSVMLearner (2)" class="LibSVMLearner">
                <parameter key="keep_example_set" value="true"/>
                <list key="class_weights">
                </list>
            </operator>
            <operator name="ModelApplier (2)" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="ChangeAttributeName" class="ChangeAttributeName">
                <parameter key="old_name" value="%{loop_value}"/>
                <parameter key="new_name" value="Actual"/>
            </operator>
            <operator name="ChangeAttributeName (2)" class="ChangeAttributeName">
                <parameter key="old_name" value="prediction(%{loop_value})"/>
                <parameter key="new_name" value="Predicted"/>
            </operator>
            <operator name="AttributeAdd" class="AttributeAdd">
                <parameter key="name" value="Label"/>
                <parameter key="value_type" value="polynominal"/>
            </operator>
            <operator name="MissingValueReplenishment" class="MissingValueReplenishment">
                <parameter key="default" value="value"/>
                <list key="columns">
                  <parameter key="Label" value="value"/>
                </list>
                <parameter key="replenishment_value" value="%{loop_value}"/>
            </operator>
            <operator name="FeatureNameFilter (2)" class="FeatureNameFilter">
                <parameter key="skip_features_with_name" value="att.*"/>
            </operator>
            <operator name="ExampleSetWriter" class="ExampleSetWriter">
                <parameter key="example_set_file" value="bla"/>
                <parameter key="format" value="special_format"/>
                <parameter key="special_format" value="$v[Label]$v[Actual]$v[Predicted]$d"/>
            </operator>
            <operator name="IOConsumer" class="IOConsumer">
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
        </operator>
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="bla"/>
        </operator>
    </operator>
  • Options
    Legacy UserLegacy User Member Posts: 0 Newbie

      :D

    Transpose, then use filter out everything but the labels. Then iterate over those.

    Very smart! Good technique to know!

    Gary
Sign In or Register to comment.