Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
how do i get better predictions
Hi,
Im using a regression model to predict sales values (label attribute) in order to select those data points where sales values could be potentially wrong. This is defined by deviations between predicted value and original value.
However some predictions are quite close to the original values and some the error rate is above 50 % within the same (artificial) data set. Using a forecasting model (e.g. ARIMA) does not make sense to me, since im not trying to forecast future values for another example set. But rather trying to check if sales values are wrong or right/flag as potentially wrong.
So I was thinking could prediction of the sales value be quite different, because the data is not based on real data?
Does anyone have a suggestion on how to recheck sales values otherwise with supervised learning methods?
Thank you!
Tagged:
0
Best Answer
-
varunm1 Member Posts: 1,207 UnicornYou need to stick with one validation type when you try to compare models. I see that in 1. you used 80:20 and in 4 you used 90:10. Trying to improve or study model performance based on split ratios is not recommended. You get different results using different split ratios, this is due to different data samples present in 1. and 4. Also, use random seed when you are using split operators so that every time you run, you get similar results.
Second, I strongly recommend you validate models using cross-validation instead of random splits. This is not to get a highly accurate model but to get stable and reliable models.
RMSE is an error (lower error better model) and it is not in percentage values. RMSE is based on residuals (deviation between predicted and true value). So these values are similar to the values in your label, it means the units are the same. For example, if you are trying to predict number of packages sold and the RMSE is 15, it means the quad mean of errors is 15 packages.
Feature generation is a concept where you can generate multiple features from existing attributes. You can use "Automatic Feature engineering" operator or you can also do it on your own based on knowledge (domain) using generate attributes. The below link gives you an idea.
https://rapidminer.com/blog/data-prep-feature-generation-selection/
Try implementing feature selection, generation (if possible), dimensionality reduction (if needed based on the number of attributes and your samples), Optimize your models by optimizing hyperparameters (Optimize Parameter (Grid)) and cross-validate.
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
7
Answers
Dortmund, Germany
Dortmund, Germany
Also, how did you build your models? Did you use any feature selection or generation?
Did you check correlations between the predictors and outcomes? We can get some idea based on that as well.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Something more I dont understand: when using Deep Learning to predict , the performance changes every time the "start execute" button is pressed though nothing else changes.
Thank you for the help!