The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Aggregation / compression instead of forecast / prediction

nicugeorgiannicugeorgian Member Posts: 31 Maven
edited November 2018 in Help

I have a data set with both nominal and numerical attributes and a numerical label.

I'm trying to fit some regression tree on this set.

I would like to use the regression tree as an aggregation / compression of the data set rows and not as a forecast. Concretely, my regression tree is not going to be applied/shown to unseen data! So, the overfitting would not be problem in this case! Of course, I should avoid ending up with so many tree leaves as rows in the data set (that wouldn't be an aggregation anymore ;) )

The goal is, however, that the trained model (the regression tree) "predicts / reflects" as much as possible the training data.

Would the regression tree (Weka W-M5P) be the best solution for this problem? If yes, how shall I choose the algorithm's parameters?

I think it would be better if I select the option "no-prunning" ...

Any ideas?



  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    if the regression tree is the best algorithm depends on your needs. If you want an understandable model, choose it. Otherwise different alternatives are possible and possibly better. But you might to have to transform your data then, because LinearRegression or SVMs don't support nominal values.

    The best parameters for learners depend on your data, so you have to try it out.

Sign In or Register to comment.