Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Question data
I have these two csv, in which both csv have several feats. Feat1- model, Feat2-power measure, Feat3- is something that this object has or does not have, being 1 has and 0 does not, Feat4 is a feature that I don’t know what it is, Feat5- device installation date, Feat6 / 7- It is the latitude and longitude and feat 8 is the number maintenance interventions. In the CSV Training I have values for feat 8 and in the Test no. My goal is to estimate the Feat 8 for the Test set. How can I do this? Thanks
0
Best Answers
-
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @andre5007, it looks your prediction target is numerical (integers). Are you sure you want to build decision tree or any predictive model for classification, rather than regressions? I would parse the label into numbers and try the regression decision trees or GLM/GBT for regression.1
-
yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @andre5007, my point was regression is better than classification here as the model for your data. Because the label is integer. For the difference between regression and classification, https://machinelearningmastery.com/classification-versus-regression-in-machine-learning/1
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Ok, I will try to see and do, if you have any questions then can you help me?
The workflow looks fine if you have your own test set. However, as Brian mentioned above, cross validation is always a smart option on your training set.
https://academy.rapidminer.com/learn/article/cross-validation
https://academy.rapidminer.com/learn/video/validating-a-model
https://rapidminer.com/blog/validate-models-cross-validation/
HTH!
YY
I put a filter at the beginning because it had a value that was missing and because of that it gave an error.
Then in the cross validation, I placed the decision tree inside the process at the training site and in the test the apply model and performance.
Then I linked the cross validation to another apply model and in that apply model I also put the test data set where I have to define feat 8.
Do you think you should change anything in the operators parameters? Because I didn't change anything just when it was necessary to be able to run the process.
What do you think I can improve? Or if I am now on the right path?
ThanksBest regards
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Can you explain how I can do to improve the value that I mark in red? Thanks