Im using a regression model to predict sales values (label attribute) in order to select those data points where sales values could be potentially wrong. This is defined by deviations between predicted value and original value.
However some predictions are quite close to the original values and some the error rate is above 50 % within the same (artificial) data set. Using a forecasting model (e.g. ARIMA) does not make sense to me, since im not trying to forecast future values for another example set. But rather trying to check if sales values are wrong or right/flag as potentially wrong.
So I was thinking could prediction of the sales value be quite different, because the data is not based on real data?
Does anyone have a suggestion on how to recheck sales values otherwise with supervised learning methods?