turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Community Home
- :
- Product Help
- :
- RapidMiner Studio Forum
- :
- Re: How can compare decision tree and linear regre...

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

12-09-2016 09:42 PM - edited 12-09-2016 09:49 PM

12-09-2016 09:42 PM - edited 12-09-2016 09:49 PM

Attached are some relavent pictures of my set up and stats on the target variable:

1) The Setup of my .rmp , 2) Picture of the Histogram of the Target Variable, 3) Plot of CO2 (target variable) vs. Primary Principle Component

I best compare models with Cross Validation to figure out especially between categorical models like decision trees vs. numerical models like linear regression. I have been learning about cross validation in my Rapidminer class, but I am not 100% sure what exactly accuracy, precision, and recall are for classification prediction and regression prediction operators. For example, I would like to use precision and class recall to compare models, but I don't know what they might be for regression because the confusion matrix is based on nomial label not a numerical label.

So How can compare decision tree and linear regression using Cross-Validated or X-Validated? What statistic or metric could I use?

Below are the stats output from my results:

Target Variable Stats CO2 Emissions Average: 87,405.93 Deviation: 628 363.80 Performance of Linear Regression: root_mean_squared_error: 34,017.261 +/- 5548.473 (mikro: 34467.846 +/- 0.000) normalized_absolute_error: 0.151 +/- 0.040 (mikro: 0.140) Performance of Decision Tree: accuracy: 90.65% +/- 4.13% (mikro: 90.64%) root_mean_squared_error: 0.282 +/- 0.063 (mikro: 0.289 +/- 0.000) normalized_absolute_error: 3.143 +/- 5.800 (mikro: 1.209) Avg. Class Precision: 62.1% Avg. Class Recall: 68% Performance of Decision Random Forest: accuracy: 82.14% +/- 1.92% (mikro: 82.14%) root_mean_squared_error: 0.392 +/- 0.025 (mikro: 0.393 +/- 0.000) normalized_absolute_error: 1.017 +/- 0.117 (mikro: 0.993) Avg. Class Precision: 86.3% Avg. Class Recall: 40% Performance of Neural Network: root_mean_squared_error: 23,815.976 +/- 4305.543 (mikro: 24211.353 +/- 0.00) normalized_absolute_error: 0.126 +/- 0.037 (mikro: 0.122) Performance of General Linearized Model (Default values): root_mean_squared_error: 21,9027.497 +/- 45537.878 normalized_absolute_error: 1.017 +/- 0.117 (mikro: 0.993)

Solved! Go to Solution.

1 ACCEPTED SOLUTION

Accepted Solutions

Highlighted
Options
## Re: How can compare decision tree and linear regression using Cross-Validated or X-Validated?

[ Edited ]

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

12-11-2016 12:24 AM - edited 12-11-2016 12:26 AM

12-11-2016 12:24 AM - edited 12-11-2016 12:26 AM

Solution

Accepted by topic author mylane3

08-16-2017
02:14 PM

What you might want to do is transform your Performance for the regression into a classification result by Discretizing the Label & Prediction variables with the same rules you applied for you Domain Expert defined bins.

Then you are comparing like for like.

**However**

One caution I would give on your classification prediction is to think about how your classification model is measured against misclassifications.

Imagine you have a numerical label with values 1 to 10.

After binning your label has the following nominal values.

Value 1: 1-3

Value 2: 4-6

Value 3: 7-9

Value 4: 10

Now if your classification model predicts something with an original numeric value of 3 and it predicts that it is in group 'Value 2: 4-6', then although this is a misclassification it is actually more accurate than if it had predicted 'Value 4: 10'. However, just looking purely at Accuracy, Precision & Recall won't reflect this. Both misclassifications as 'Value 4:10' and 'Value 2:4-6' have the same performance value 0... which is just not correct.

I would recommend that you use the **Performance (Costs)** operator and create a misclassification costs matrix. That way you can reflect that misclassifications in nearby groups are 'less costly' than those in more distance groups.

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --

www.RapidMinerChina.com

www.RapidMinerChina.com

2 REPLIES

Highlighted
Options
## Re: How can compare decision tree and linear regression using Cross-Validated or X-Validated?

[ Edited ]

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

12-11-2016 12:24 AM - edited 12-11-2016 12:26 AM

12-11-2016 12:24 AM - edited 12-11-2016 12:26 AM

Solution

Accepted by topic author mylane3

08-16-2017
02:14 PM

What you might want to do is transform your Performance for the regression into a classification result by Discretizing the Label & Prediction variables with the same rules you applied for you Domain Expert defined bins.

Then you are comparing like for like.

**However**

One caution I would give on your classification prediction is to think about how your classification model is measured against misclassifications.

Imagine you have a numerical label with values 1 to 10.

After binning your label has the following nominal values.

Value 1: 1-3

Value 2: 4-6

Value 3: 7-9

Value 4: 10

Now if your classification model predicts something with an original numeric value of 3 and it predicts that it is in group 'Value 2: 4-6', then although this is a misclassification it is actually more accurate than if it had predicted 'Value 4: 10'. However, just looking purely at Accuracy, Precision & Recall won't reflect this. Both misclassifications as 'Value 4:10' and 'Value 2:4-6' have the same performance value 0... which is just not correct.

I would recommend that you use the **Performance (Costs)** operator and create a misclassification costs matrix. That way you can reflect that misclassifications in nearby groups are 'less costly' than those in more distance groups.

-- Training, Consulting, Sales in China, Hong Kong & Taiwan --

www.RapidMinerChina.com

www.RapidMinerChina.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

08-16-2017 02:15 PM