I'm working on a random forest predictive model to predict a binary label. The dataset is about 70% and 30% unbalanced. The attributes are numeric and represent financial statement indices or amounts in euros such as EBITDA.
The process includes data reading, selection of features with missing value <10%, normalization (Z transformation), replace missing values with the average, cross-validation with undersampling of the majority label class in the training data, RF with information gain ( 200 trees of depth 15).
The performances are not good; accuracy about 74%, recall weighted 75%, precision weighted 72%; f measure 65.89 (class precision primary class 57%)
How can I improve performance? Do you have any suggestions?