Sampling (Balancing) and Cross validation
hey everyone I want to train a decision tree model and I already use a cross validation operator for training my model. However I also need to balance my data since I have two classes from which one is repesented much less times. I am concerned now how to use the samling Operator. I know how to use it to balance my data, i am more wondering if it matters if i put the sampling operator into the subprocess of the cross validation operator or if i can also balance the dataset right before. I somewhere saw it is typical and better to use the sampling operator in the cross validation operator, because otherweise some data point get out of scope. But does it really mater because if i think about it again, it does not mae that much sense for me and it should not matter if I use sample before or after. Can someone give me a answer about this?