Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

RemoveCorrelatedFeatures & unseen datasets

falcor781falcor781 Member Posts: 2 Contributor I
edited November 2018 in Help
We are using RapidMiner for much of our Educational Data Mining Research and have a question for you:

We frequently use "RemoveCorrelatedFeatures" on our "training" datasets to generate models with a subset of (only) relevant attributes. However, we're running into some difficulties when trying to apply our model on our "unseen" datasets.

After we build a model with our "training" dataset with correlated features removed, we want to apply the models to "unseen" data from a different dataset. However, we must manually remove those attributes from the unseen dataset by hand to match what RemoveCorrelatedFeatures removed for attributes from the training set in order to be compatible with the model.

Is there a way to do this in RapidMiner more easily without having to do something outside of RapidMiner?

Thanks in advance!

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    yes, of course  ;)

    You could extract weights from your resulting training data set with the operator "Data to Weights". This will result in a new attribute weights object with a weight of 1 for all remaining attributes. Then you can apply those weights on the testing data set with the operator "Select by Weights" which gets the test data together with the extracted weights. Make sure that the parameter "deselect unknown" is set to true (which is the default but anyway).

    Cheers,
    Ingo
  • falcor781falcor781 Member Posts: 2 Contributor I
    Thank you for the quick response! I actually tried your idea (modified slightly to work with RapidMiner 4.6) and it worked out. I just forgot that I hadn't thanked you profusely for the help :)

    You've made many people in our lab happy.
Sign In or Register to comment.