🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
Model selection for imbalanced training dataset
I'm doing model selection for SVM using the "Optimize Parameters (Grid)" operator, my training dataset is imbalanced/skewed (782 positive examples and 2048 negative examples), so we cannot use Accuracy (= (TP+TN)/(TP+TN+FP+FN)) as a score for model selection (because if the predictor predicts everything as negative, the accuracy will easily reach 2048/(2048+782)= 72.3%). So may I ask if there is a way to choose Precision and Recall, or a combined function of them like F1 score instead of Accuracy? I did look into the parameter list of Performance operator but could not see those scores. Or is there other way to deal with imbalanced dataset like this?
I attach my process file here. In this process, I use "Optimize Parameters (Grid)" operator to find the SVM's hyper-parameters that give the best cross-validation performance. This process works very well on a balanced training dataset, now I wonder how to modify it for an imbalanced one. Thank you very much for your help!