**🦉 🎤 RapidMiner Wisdom 2020 - CALL FOR SPEAKERS DEADLINE IS NOVEMBER 15 🦉 🎤**

### CLICK HERE TO GO TO ENTRY FORM

# "Insufficient results with M5P regression tree"

michaelhecht
Member Posts:

**89**Guru
Hello,

if anyone is interested please try the following:

produce a file containing two columns

x = 0, 0.1, 0.2, ..., 12.6;

y = sin(x)

Then apply M5P (with or without normalization).

The result is quite disappointing. Does anyone know how to get an acceptable result?

I expected to get something like a picewise linear approximation of the sin function,

but got something far away from this.

Thank You.

if anyone is interested please try the following:

produce a file containing two columns

x = 0, 0.1, 0.2, ..., 12.6;

y = sin(x)

Then apply M5P (with or without normalization).

The result is quite disappointing. Does anyone know how to get an acceptable result?

I expected to get something like a picewise linear approximation of the sin function,

but got something far away from this.

Thank You.

Tagged:

0

## Answers

157GuruI just tried the following process, and the only changes from the default settings are to click the check box for parameters N, U, and R : And the plot of x vs. prediction(y) looks, to my eyes, much more sin-like. But I don't know if using an unpruned, unsmoothed learner makes sense for your problem.

Keith

89Gurusorry, here is the XML

<operator name="Root" class="Process" expanded="yes">

<operator name="ExampleSource" class="ExampleSource">

<parameter key="attributes" value="C:\Programme\Rapid-I\RapidMiner-4.4\sinus"/>

</operator>

<operator name="Normalization" class="Normalization">

</operator>

<operator name="W-M5P" class="W-M5P">

<parameter key="keep_example_set" value="true"/>

<parameter key="U" value="true"/>

<parameter key="M" value="10.0"/>

</operator>

<operator name="ModelApplier" class="ModelApplier">

<list key="application_parameters">

</list>

</operator>

</operator>

What I get is a piecewise constant result, i.e. the leafes of the tree are: y = const

Only the last leaf gives a linear model: y = 3.2196 * x - 4.5545

If I had such a "really" linear model at all leafes of the tree, it would be ok, i.e. as

I would expect it.

There are no settings which can improve it, even if the tree could result in y = a*x+b

in each leaf, which should give a better prediction. So why does'nt M5P behave like

this?

If I select the smoothed tree the results are even worse.

I hope I could make my "problem" more clear to you.

P.S.:

Maybe if you google for "stepwise regression tree HUANG" or go directly to

http://www.landcover.org/pdf/ijrs24_p75.pdf

and there at page 77 (i.e. page 3 in the 16 pages document) you see what I

mean. If this SRT algorithm would become a part of RapidMiner I would

appreciate it , even if I don't understand why M5P doesn't behave comparable.

157Guru89GuruNevertheless, I cannot understand, why the fraction of constant leafs, i.e. y = const, increases if I change M from 5 to 6.

I get 10 constant leafes more at positions where y = a*x+b would be better. Isn't the result with a constant regression

worse than a non constant regression in the leafs?

It's clear to me that thealgorithm is from Weka and not RapidMiner, so You cannot know in detail what happens.

Nevertheless, I only want to understand, why, by increasing M, the number of constant leafs increases even

if it worses the result.

By the way, if you are an expert , would it be possible to post a workflow for optimizing the parameters automatically.

Up to now I didn't get the right feeling for applying meta methods like grid search or x-validation in the right way.

Thank's in advance. (At least I need an answer on my question, the workflow would be nice)

157GuruAs for the parameter optimization, take a look at 07_Meta/01_ParameterOptimization.xml in the RM samples directory. The GridParameterOptimization node is where you'd specify what parameters you want to tinker with.

89GuruThe problem where I tested M5P was originally only for me to get an idea how M5P works.

Finally I'm really in doubt applying this method to other data that I'm not familiar to.