Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Repeating model building for multiple labels

keithkeith Member Posts: 157 Maven
edited November 2018 in Help
Hi,

I have a dataset containing about 20 attributes and 6 numerical label variables I want to predict.  I would like to use the same type of modeling process (NearestNeighbor with attribute weights determined by EvolutionaryWeighting, all inside a WrapperXValidation) to predict each label, allowing the attribute weights to be optimized separately for each label.

Ideally, I could iterate through each label to predict, using the same operator structure, rather than writing out 6 slightly different operator chains.  Something like this pseudo-code:

For (predictvar in list_of_predict_vars)
    Set label = predictvar
        Do XVal - EvoWeights - NearestNeighbor model fit
        Save model and performance results for this predictvar
    Go to next predictvar
Generate predictions on original data using all 6 models

I suspect that using macros could get me close to doing this, and there seems to be some related approaches mentioned at http://rapid-i.com/rapidforum/index.php/topic,32.msg47.html and http://rapid-i.com/rapidforum/index.php/topic,35.msg64.html ; But I haven't quite figured out how to iterate through a user-defined list of values, and to change the label variable of a dataset using that list.

Any suggestions?

Thanks,
Keith

Answers

  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Keith,

    the operator [tt]MultipleLabelIterator[/tt] was exactly implemented for that purpose. Simply load your example set, mark the labels as special attributes and give them the appropriate names "label1", ..., "label6". Then put all your model building into the meta operator. When saving the model you may use the macro [tt]%{a}[/tt] in the file name string which captures the number of the current iteration of the outer operator chain.

    The application of the model can be analogously done afterwards.

    Hope that helps,
    Tobias
  • keithkeith Member Posts: 157 Maven
    Perfect!  I should have known RM was already prepared to handle the task.  :) 

    Thanks, Tobias!
  • keithkeith Member Posts: 157 Maven
    I'm running into a slight problem using the MultipleLabelIterator.  I have renamed the label variables to start with "label_", but I can't change all of them to be of role "label".  I'm using the ChangeAttributeRole operator to change one variable at a time to type "label", but only the last variable so changed is retained.  Any variable that was previously of role "label' gets deleted from the ExampleSet data.

            <operator name="Change label_var1 to label" class="ChangeAttributeRole" breakpoints="after">
                <parameter key="name" value="label_var1"/>
                <parameter key="target_role" value="label"/>
            </operator>
    # this works, changing label_var1 from regular to label

            <operator name="Change label_var2 to label" class="ChangeAttributeRole" breakpoints="after">
                <parameter key="name" value="var2"/>
                <parameter key="target_role" value="label"/>
            </operator>
    # this changes label_var2 from regular to label, but it deletes label_var1 from the data!
    What am I doing wrong?  Is there a better way to change the role of a group of attributes to label?
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi,

    there can only be one special attribute named label at a time. Nevertheless you can mark them as a special attribute label_1, label_2, etc. Without looking it up, I don't knwo whether the [tt]MultipleLabelIterator[/tt] checks for the attribute names or their "special names". But if you both name them that way and mark them as I mentioned above, this should be sufficient.

    Regards,
    Tobias
  • keithkeith Member Posts: 157 Maven
    Thanks, Tobias.  I got it working.  I was confused as to whether the attribute name or its type ("special attribute") was the one that needed to the label prefix.  It's the latter, of course.

    Followup question: Is there a way inside the MultipleLabelIterator inner operators to reference the current label attribute name?  The reason is that I need to convert the prediction, expressed in log-odds, back to a probability as I had previously asked about in http://rapid-i.com/rapidforum/index.php/topic,219.msg860.html.

    Thus, I need to take the prediction attribute "prediction(y)", and rename it to "pred_y", then transform it by "exp(pred_y)/(1+exp(pred_y))".  Then do the same for prediction(z) -> pred_z -> exp(pred_z)/(1+exp(pred(z))

    If I can get the current label attribute within the iterator in a macro variable, then I should be able to automate this process (assuming there are string functions that will allow me to append and/or take substrings of macro vars).

    Alternatively, if there's an easier way to accomplish what I described above, I'd be open to that as well.

    Thanks, as always.  These forums have been immensely helpful in getting me up and running with RM, and I'm most grateful.

    Keith
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Keith,
    keith wrote:

    Followup question: Is there a way inside the MultipleLabelIterator inner operators to reference the current label attribute name?
    as far as I know, there is no way to directly access the complete label name via a macro. But as each iteration operator the [tt]MultipleLabelIterator[/tt] should define a macro which returns the number of the current iteration. Hence, if you name the labels as [tt]label_1[/tt], [tt]label_2[/tt], [tt]label_3[/tt], and so on you can access the label name by using the string [tt]label_%{a}[/tt] in parameters, where [tt]%{a}[/tt] returns the number of the current iteration.

    Hope that helps,
    Tobias
  • keithkeith Member Posts: 157 Maven
    Thanks Tobias.  I was hoping to be able to leave the attribute name unchanged since it's more meaningful than "label_1", but what you suggest does work, and I've implemented that now.

    However, the %{a} macro doesn't seem to be able to be used inside the list of calculations in the FeatureGenerator.  For example, I have the following defined inside a MultipleLabelIterator node to apply a model, change the prediction column name to remove parentheses, and then calculate the probability from the predicted log-odds value:

            <operator name="Generate Predictions" class="ModelApplier">
                <list key="application_parameters">
                </list>
                <parameter key="keep_model" value="true"/>
            </operator>
            <operator name="Select prediction column" class="AttributeSubsetPreprocessing" expanded="yes">
                <parameter key="attribute_name_regex" value="prediction.*"/>
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="process_special_attributes" value="true"/>
                <operator name="Rename prediction column" class="ChangeAttributeName" breakpoints="before,after">
                    <parameter key="new_name" value="predict_%{a}"/>
                    <parameter key="old_name" value="prediction(label_%{a})"/>
                </operator>
            </operator>
            <operator name="Calculate Predicted Probability" class="FeatureGeneration" breakpoints="after">
                <list key="functions">
                  <parameter key="pred_odds_%{a}" value="exp(predict_%{a})"/>
                  <parameter key="pred_plus1_%{a}" value="+(const[1](), pred_odds_%{a})"/>
                  <parameter key="pred_prob_%{a}" value="/(pred_ubb2_odds, pred_ubb2_plus1)"/>
                </list>
                <parameter key="keep_all" value="true"/>
            </operator>
    The rename of the column works fine ( "prediction(label_1)" gets renamed to "predict_1").  However, the FeatureGeneration node creates new attributes named "pred_odds_%{a}", "pred_plus1_%{a}", and "pred_prob_%{a}", taking the %{a} literally, not as a macro.  Am i doing something wrong, or is RM not set up to work this way?

    Sorry to keep pestering you with these questions... but I do appreciate the help.

    Keith
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 295 RM Product Management
    Hi Keith,

    hm, but the function values are at least computed correctly? The problem you experienced might be due to the parameter lists. As far as I remember, macros can not be used in parameter lists. As a workaround, you can generate functions with generic names like [tt]pred_odds[/tt] which you change afterwards to [tt]pred_odds_%{a}[/tt], again using the [tt]ChangeAttributeName[/tt] operator.

    Hope that solves your problem,
    Tobias
Sign In or Register to comment.