Question data

andre5007 · April 2021

I have these two csv, in which both csv have several feats.
Feat1- model, Feat2-power measure, Feat3- is something that this object has or does not have, being 1 has and 0 does not, Feat4 is a feature that I don’t know what it is, Feat5- device installation date, Feat6 / 7- It is the latitude and longitude and feat 8 is the number
maintenance interventions.
In the CSV Training I have values for feat 8 and in the Test no.
My goal is to estimate the Feat 8 for the Test set.
How can I do this? 
Thanks

yyhuang · April 2021

Hi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.

andre5007 · April 2021

Hi @yyhuang
Why do you think regression decision trees or GLM/GBT for regression is better?
Thanks
André

yyhuang · April 2021

Hi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/

Telcontar120 · April 2021

You should review the RapidMiner tutorials for Cross Validation and for Apply Model. Basically you are going to define Feat 8 as the label and build your model on that, and then you are going to save that model and apply it to the 2nd dataset.

andre5007 · April 2021

Ok, I will try to see and do, if you have any questions then can you help me?

andre5007 · April 2021

Can someone tell me if I'm going in a good way please?

Image: https://us.v-cdn.net/6030995/uploads/editor/la/hydm0pfyzdyq.png

yyhuang · April 2021

Hi @andre5007,

The workflow looks fine if you have your own test set. However, as Brian mentioned above, cross validation is always a smart option on your training set.

https://academy.rapidminer.com/learn/article/cross-validation
https://academy.rapidminer.com/learn/video/validating-a-model
https://rapidminer.com/blog/validate-models-cross-validation/

HTH!

YY

andre5007 · April 2021

Now I noticed that I was wrong on the print I sent, because it was not the one I wanted to have selected.

I put a filter at the beginning because it had a value that was missing and because of that it gave an error.

Then in the cross validation, I placed the decision tree inside the process at the training site and in the test the apply model and performance.

Then I linked the cross validation to another apply model and in that apply model I also put the test data set where I have to define feat 8.

Do you think you should change anything in the operators parameters? Because I didn't change anything just when it was necessary to be able to run the process.

What do you think I can improve? Or if I am now on the right path?

Thanks
Best regards

André

Image: https://us.v-cdn.net/6030995/uploads/editor/3i/wwakdy4y35b0.png

Image: https://us.v-cdn.net/6030995/uploads/editor/an/buidlr6j7meo.png

Telcontar120 · April 2021

Looks like a good setup for basic model construction and validation with an additional out-of-sample validation.

andre5007 · April 2021

Can you explain how I can do to improve the value that I mark in red?
Thanks

rugmanasokan · September 2022

As a model for your data, regression is better than classification. Due to the integer nature of the label. In order to understand the difference between regression and classification - https://nimblebox.ai/blog/regression-machine-learning

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Question data

Best Answers

Answers