Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Ensemble method for multiple data sets
Hi,
I am currently working in RapidMiner 4.6.
I have extracted features from my main data set into 2 sets of features. One is a word vector (53 features) and the other is a set with 10 different features.
I have 2 different classifiers that I would like to combine in an ensemble method:
Logistic Regression on the word vector
W-J48graft on a different set of features
From my understanding I can only use operators such as stacking and voting if I give it one and the same data set as input.
How would I go about combining predictions from both my data sets using an ensemble method?
Thank you in advance!
I am currently working in RapidMiner 4.6.
I have extracted features from my main data set into 2 sets of features. One is a word vector (53 features) and the other is a set with 10 different features.
I have 2 different classifiers that I would like to combine in an ensemble method:
Logistic Regression on the word vector
W-J48graft on a different set of features
From my understanding I can only use operators such as stacking and voting if I give it one and the same data set as input.
How would I go about combining predictions from both my data sets using an ensemble method?
Thank you in advance!
0
Answers
ensemble methods require that your features (in your case the word vector and the 10-feature set) are part of every single instance in your data set. Therefor you could join them together to one example set and inside the vote or stacking operator use one attribute filter for every learner to hide the unwanted features from your learners. Please note that voting operator performs majority voting for classification tasks, therefor you might need more than two learners inside...
BUT, what is really not possible is to combine a regression learner with a classification approach (unless you have both a categorical label and a corresponding numerical value for each example, which is a rather bizarre setup)
Here is an example for a simple stacking approach with different input for all base learners:
Cheers,
Helge