Options

# How to discreticize the target attribute

christian1983
Member Posts:

**11**Contributor II
Hello everybody,

One part of my diploma thesis is to solve basically regression problems with classifiers. So i have to discreticize the target attribute. RapidMiner provides three main methods to do so: Discreticize by Binning, by Size and by Frequency. But the choice of the number of bins has an essential effect on the accuracy of the model to be used, so i wonder, how to define the optimal number of bins.

According to the well known data mining literature there are many thumb rules like number of bins = max {1,2 log (l)}, where l is the number of examples. Nevertheless there is one interesting method using a wrapper approach to determine the optimal number of bins (http://reference.kfupm.edu.sa/content/r/e/regression_using_classification_algorith_88480.pdf). Unfortunately I didn´t find any possibility in Rapid Miner to do so, even the optimization operator with the number of bins of the discretization operator as selected parameter comes to no reasonable result.

Maybe one of you has an interestind idea to help solving my problem.

Thank you very much.

One part of my diploma thesis is to solve basically regression problems with classifiers. So i have to discreticize the target attribute. RapidMiner provides three main methods to do so: Discreticize by Binning, by Size and by Frequency. But the choice of the number of bins has an essential effect on the accuracy of the model to be used, so i wonder, how to define the optimal number of bins.

According to the well known data mining literature there are many thumb rules like number of bins = max {1,2 log (l)}, where l is the number of examples. Nevertheless there is one interesting method using a wrapper approach to determine the optimal number of bins (http://reference.kfupm.edu.sa/content/r/e/regression_using_classification_algorith_88480.pdf). Unfortunately I didn´t find any possibility in Rapid Miner to do so, even the optimization operator with the number of bins of the discretization operator as selected parameter comes to no reasonable result.

Maybe one of you has an interestind idea to help solving my problem.

Thank you very much.

0

## Answers

2,531Unicornsince you are probably studying computer science I would suggest to simply implement the algorithm?

Anyway this should work using a Parameter Optimization in combination with a cross validation, where's the problem? In measuring the error? You will have to think about reconverting the polynomial value reflecting the range to a numerical value to measure the regression performance, but this is also possible in RapidMiner with some String replacement, Generate Attribute applications and Parse Numbers.

Greetings,

Sebastian