The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Improve Random forest performance

a_politoa_polito Member Posts: 3 Newbie
Hello! :) I'm working on a random forest predictive model to predict a binary label. The dataset is about 70% and 30% unbalanced. The attributes are numeric and represent financial statement indices or amounts in euros such as EBITDA.

The process includes data reading, selection of features with missing value <10%, normalization (Z transformation), replace missing values with the average, cross-validation with undersampling of the majority label class in the training data, RF with information gain ( 200 trees of depth 15).

The performances are not good; accuracy about 74%, recall weighted 75%, precision weighted 72%; f measure 65.89 (class precision primary class 57%)

How can I improve performance? Do you have any suggestions?

Best Answer

  • Options
    rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
    Solution Accepted
    Hello, and hopefully it's not too late to answer:

    It might be very difficult to answer if we don't know the data, and there might be several strategies. Do you have the possibility of applying some kind of discretization? (converting continuous values into discrete ones or "badges" might help). Do you know if there is any kind of anomaly or trend that might be masked into the data? Those are the ones that I can come up here.

    Also, undersampling might sometimes introduce issues, as the data is artificial. Weighting might be better, if your algorithm supports it.

Sign In or Register to comment.