The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

Options

# "The regression trees returned by the operators W-M5P and W-REPTree"

nicugeorgian
Member Posts:

**31**Maven
Hello,

in the text version of the regression trees returned by W-M5P and W-REPTree: how should one read a tree branch (leaf) of the following form:

What do the numbers 798 and 81.241% represent?

It seems to me that, in my example,

Many thanks for any idea!

Cheers,

Geo

in the text version of the regression trees returned by W-M5P and W-REPTree: how should one read a tree branch (leaf) of the following form:

attribute = RU,PK,TW,TR,IT <= 0.5 : : LM5 (798/81.241%)Does

attribute = RU,PK,TW,TR,IT <= 0.5mean that

*attribute*is**not**among the values*RU,PK,TW,TR,IT*?*LM5*is defined below the tree, and I assume it represents the value predicted (forecasted) for that leaf, correct?What do the numbers 798 and 81.241% represent?

It seems to me that, in my example,

*attribute*is treated as numerical although it's categorical (nominal). Is there a way to specify before the regression trees are run?Many thanks for any idea!

Cheers,

Geo

Tagged:

0

## Answers

1,751RM FounderThe first number is the number of training instances falling into this leaf and the second number is the root mean squared error of the linear model on these training examples divided by the global absolute deviation.

As far as I know the nominal attributes are internally all converted into binary attributes which are then handled as numerical (hence the split value 0.5). I don't think that you can change this behavior since it one of the basic idea of the M5 algorithm.

Cheers,

Ingo

31Maventhanks for the explanations. What do you exactly mean by ? Do you mean the global

averageabsolute deviation defined asthe average of all the absolute differences between every element of the

wholesample (not only the instances falling into that leaf) and the mean of thewholesample set?Is there a document where I can see the exact definitions of the numbers in the tree's leaves?

Thanks in advance!

Cheers,

Geo

1,751RM FounderCheers,

Ingo

20MavenThis is the output (RapidMiner version 4.4)

W-REPTree

REPTree

============

Intensity < 0.98 : 0.23 (240/0.48) [144/0.49]

Intensity >= 0.98 : -0.07 (1754/0.47) [853/0.48]

Size of the tree : 3

or to simplify, for each leaf we have

Condition : A (B/C) [D/E]

I'm guessing that:

A is the label or predicted class

B is the number of training samples found at this leaf and used to calculate the statistics

C is the RMSE (root mean squared error) when 'A' is used as the prediction for the B samples, divided by the global absolute deviation

... but I don't know what D or E are...

Any help would be appreciated.

Thanks!

2,531UnicornI'm sorry, but I'm completely unfamiliar with the weka learners. Unlike Ingo I don't even have the source code to take a deeper look into. Did you search on the Weka Mailing list for informations about that?

Greetings,

Sebastian

74Guru