Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Merging feature sets
Hello!
I developed a process that consists of the following steps:
1) Read dataset
2) Split in test and training set
3) Do a feature selection in the training set using SVM, Foward selection, X-Val and OptimizeParameters
4) Build a model using the selected parameters
4) Apply the resulting model (that is, the one generated with the best features) to the test set
The problem is that SVM classifiers expect that the test set has exactly the same features that were used to build the model, otherwise the results are screwed up. But I did not manage to filter out the features of the test set that were not among the selected ones.
Stating more concisely, given two different datasets A and B, where the features of B consists of a subset of the features of A, I need a dataset C that consist of the data contained in A but comprising only the features shared with B:
Dataset A
Dataset B
Dataset C
I am doing things this way (instead of using only X-Val) so as to guarantee that my test set is not used at all during the modelling process.
If somebody has some clue of how doing this (or if I should do it another way) I will thank a lot!
Best regards,
Vinicius
I developed a process that consists of the following steps:
1) Read dataset
2) Split in test and training set
3) Do a feature selection in the training set using SVM, Foward selection, X-Val and OptimizeParameters
4) Build a model using the selected parameters
4) Apply the resulting model (that is, the one generated with the best features) to the test set
The problem is that SVM classifiers expect that the test set has exactly the same features that were used to build the model, otherwise the results are screwed up. But I did not manage to filter out the features of the test set that were not among the selected ones.
Stating more concisely, given two different datasets A and B, where the features of B consists of a subset of the features of A, I need a dataset C that consist of the data contained in A but comprising only the features shared with B:
Dataset A
ID | Feature 1 | Feature 2 | Feature 3 |
1 | 3 | 9 | 2 |
2 | 5 | 3 | 1 |
ID | Feature 1 | Feature 3 |
35 | 1 | 0 |
41 | 2 | 9 |
29 | 2 | 9 |
ID | Feature 1 | Feature 3 |
1 | 3 | 2 |
2 | 5 | 1 |
If somebody has some clue of how doing this (or if I should do it another way) I will thank a lot!
Best regards,
Vinicius
Tagged:
0
Answers
You could try the "Data to weights" and "Select by weights" operators. See the enclosed.
regards
Andrew
Thanks a lot for the answer and for the example! It worked and now I could complete my process.
Best regards,
Vinicius