Binary text classification - Help in process needed.
We want to do a binary classification on a text data set with the distribution 80% negative class, 20% positive class. In order to reach maximum statistical meaningfulness, we want to do so by using 10-fold cross validation.
If we model this within Rapidminer, we are unsuccessful since it doesn’t output any statistical metrics (like precision, recall, etc):
We found a workaround that works, but it doesn’t make any sense out of a ML perspective: If we first divide into training or test and then use 10-fold-crossvalidation it works — But the training or test split should be part of the crossvaligdation (9 training folds, 1 test fold, 10 iterations). So right now the only way to get this working is by FIRST dividing into test and training and THEN use X-Validation. Did we model it the right way or did we miss anything?
If you need any more information for helping us, just comment.
Thank you very much in advanced.