Data mining
Hi guys, I was doing a job but I found a problem and I don't know how to start, I'm really new to using the rapidminer, and I would like to know if anyone could help me. I have to estimate Feature 8 which is the number of maintenance interventions the device has had. What can I do? Thanks André
Tagged:
0
Best Answer

yyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 363 RM Data Scientist
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I worked on your training data a bit to build regression trees based on clean features. The predictive model performs pretty good with 10fold cross validation. RMSE is as follows
My process attached for your reference.
Cheers,
YY
Thanks
André
I used the csv files from you in another thread. They are attached here as well.
Cheers,
YY
This way I can understand what?
Ps. the feat1 could potentially result in some data leakage if we apply target encoding on such categorical attributes with soo many values. I don't have the context here but you can try to drop it by configuring "Target Encoding".
Pps. you can round up the predictions after scoring if you prefer to integers.
HTH!
André
I hope it makes sense
André
According to your definition, the model is predicting " Feat 8, which is the number of maintenance interventions."
I will stick to the regression models (KNN, regression tree, Random Forest, GLM, GBT are good choices for regression) because you will predict a numerical target. If the target is categorical, saying true/false, broken/normal, then go classification.
Besides visualization for data exploration and outlier detection, you can also use some of the outlier detection models (e.g. Tukey test for exponential distribution... )
I fully understand why you use the regression method, why the classification method is not the best, but I was kind of at a loss as to why you for example don't use the associations & correlations method is there a reason?