Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Repeating model building for multiple labels
Hi,
I have a dataset containing about 20 attributes and 6 numerical label variables I want to predict. I would like to use the same type of modeling process (NearestNeighbor with attribute weights determined by EvolutionaryWeighting, all inside a WrapperXValidation) to predict each label, allowing the attribute weights to be optimized separately for each label.
Ideally, I could iterate through each label to predict, using the same operator structure, rather than writing out 6 slightly different operator chains. Something like this pseudo-code:
For (predictvar in list_of_predict_vars)
Set label = predictvar
Do XVal - EvoWeights - NearestNeighbor model fit
Save model and performance results for this predictvar
Go to next predictvar
Generate predictions on original data using all 6 models
I suspect that using macros could get me close to doing this, and there seems to be some related approaches mentioned at http://rapid-i.com/rapidforum/index.php/topic,32.msg47.html and http://rapid-i.com/rapidforum/index.php/topic,35.msg64.html ; But I haven't quite figured out how to iterate through a user-defined list of values, and to change the label variable of a dataset using that list.
Any suggestions?
Thanks,
Keith
I have a dataset containing about 20 attributes and 6 numerical label variables I want to predict. I would like to use the same type of modeling process (NearestNeighbor with attribute weights determined by EvolutionaryWeighting, all inside a WrapperXValidation) to predict each label, allowing the attribute weights to be optimized separately for each label.
Ideally, I could iterate through each label to predict, using the same operator structure, rather than writing out 6 slightly different operator chains. Something like this pseudo-code:
For (predictvar in list_of_predict_vars)
Set label = predictvar
Do XVal - EvoWeights - NearestNeighbor model fit
Save model and performance results for this predictvar
Go to next predictvar
Generate predictions on original data using all 6 models
I suspect that using macros could get me close to doing this, and there seems to be some related approaches mentioned at http://rapid-i.com/rapidforum/index.php/topic,32.msg47.html and http://rapid-i.com/rapidforum/index.php/topic,35.msg64.html ; But I haven't quite figured out how to iterate through a user-defined list of values, and to change the label variable of a dataset using that list.
Any suggestions?
Thanks,
Keith
0
Answers
the operator [tt]MultipleLabelIterator[/tt] was exactly implemented for that purpose. Simply load your example set, mark the labels as special attributes and give them the appropriate names "label1", ..., "label6". Then put all your model building into the meta operator. When saving the model you may use the macro [tt]%{a}[/tt] in the file name string which captures the number of the current iteration of the outer operator chain.
The application of the model can be analogously done afterwards.
Hope that helps,
Tobias
Thanks, Tobias!
there can only be one special attribute named label at a time. Nevertheless you can mark them as a special attribute label_1, label_2, etc. Without looking it up, I don't knwo whether the [tt]MultipleLabelIterator[/tt] checks for the attribute names or their "special names". But if you both name them that way and mark them as I mentioned above, this should be sufficient.
Regards,
Tobias
Followup question: Is there a way inside the MultipleLabelIterator inner operators to reference the current label attribute name? The reason is that I need to convert the prediction, expressed in log-odds, back to a probability as I had previously asked about in http://rapid-i.com/rapidforum/index.php/topic,219.msg860.html.
Thus, I need to take the prediction attribute "prediction(y)", and rename it to "pred_y", then transform it by "exp(pred_y)/(1+exp(pred_y))". Then do the same for prediction(z) -> pred_z -> exp(pred_z)/(1+exp(pred(z))
If I can get the current label attribute within the iterator in a macro variable, then I should be able to automate this process (assuming there are string functions that will allow me to append and/or take substrings of macro vars).
Alternatively, if there's an easier way to accomplish what I described above, I'd be open to that as well.
Thanks, as always. These forums have been immensely helpful in getting me up and running with RM, and I'm most grateful.
Keith
Hope that helps,
Tobias
However, the %{a} macro doesn't seem to be able to be used inside the list of calculations in the FeatureGenerator. For example, I have the following defined inside a MultipleLabelIterator node to apply a model, change the prediction column name to remove parentheses, and then calculate the probability from the predicted log-odds value: The rename of the column works fine ( "prediction(label_1)" gets renamed to "predict_1"). However, the FeatureGeneration node creates new attributes named "pred_odds_%{a}", "pred_plus1_%{a}", and "pred_prob_%{a}", taking the %{a} literally, not as a macro. Am i doing something wrong, or is RM not set up to work this way?
Sorry to keep pestering you with these questions... but I do appreciate the help.
Keith
hm, but the function values are at least computed correctly? The problem you experienced might be due to the parameter lists. As far as I remember, macros can not be used in parameter lists. As a workaround, you can generate functions with generic names like [tt]pred_odds[/tt] which you change afterwards to [tt]pred_odds_%{a}[/tt], again using the [tt]ChangeAttributeName[/tt] operator.
Hope that solves your problem,
Tobias