Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Test existing model on a different dataset
viktorvanbeerse
Member Posts: 1 Learner II
Hi,
I have two datasets which are very similar (same attributes & label), yet one of them is incomplete. The assignment is to develop a predictive model (Decision Tree and Logistic Regression) with the "incomplete" data and to validate this on the other dataset. So the goal is to develop the model with one dataset (the "incomplete" one) as training set and to use the other dataset (the "complete" one) as test set. Does anybody know if it is possible to model this issue by means of cross-validation/performance?
Thank you in advance
Viktor
0
Answers
This sounds backwards. You need to train a classification task with a label. This means that you already have some 'truth' on a historical data set. For example, you have a training data set that has labels for churn and loyal. Then you train on that and you use the "incomplete" data set as your scoring set, which will then autogenerate the prediction.
Hi,
not sure what you mean by incomplete data, but assuming it means that some attributes have missing values, it should be straight forward, as long as your training data has sufficient values for the desired labels. See, if the below sample process is doing what you want.
You may need to do some pre-processing though, depending on the learning algorithm you chose.