Irresolute outcome (usage of trained data model to classified data)

birteisabel_kuhbirteisabel_kuh Member Posts: 1 Newbie
Dear Rapidminer-Community,

at the moment we're doing a project in university and our results make us feel a little desperate. 
We have a task to build a model out of training data to categeorize classified data. It's about quotes of return of customers. We are able to figure out the category of the train data (H, N + U) and alltogether there are for example ~1300 U out of 20,000 datasets. When we finished applying the model on the classified data, there are 0 U with the setting 
criterion: gain_ratio and only 24 U with the setting criterion: accuracy. 
Since it doesn't seem to be logical, we don't know where the mistake could be. 
We used the following operators:
Retrieve Train Data, Set Role, Split Data, Decision Tree, Apply Model and in the end we added Retrieve classified data to the model. 

Are there any "easy" possibilities to solve this problem? Or are there any special settings for the decision tree we have to use to get clearer results? 

We would be very happy about a helpful answer!
Thanks in advance!


  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @birteisabel_kuh

    Just want a clarification, so when you are classifying the data set your model is unable to predict U Label? If this is the case, I see that you are using random split which can get data that is bad. Can you try the cross-validation operator with 5 folds and see how the model is performing. As this uses all the data from both training and testing you can see if the performance increases.


    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.