Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Average
Hello
For one data set I need label, so I use average of rows for that, now the question is that: why with a simple label the accuracy is very low and why if I use average as an attribute then according to the average I made "UP" and "DOWN" label, the accuracy is 95%?
What is your idea about it?
Is average bring correlation for data? ( I think correlation is not good in data) so why the result is not normal?
Except average what is your suggestion in statistics?
Thank you in advance
sara
For one data set I need label, so I use average of rows for that, now the question is that: why with a simple label the accuracy is very low and why if I use average as an attribute then according to the average I made "UP" and "DOWN" label, the accuracy is 95%?
What is your idea about it?
Is average bring correlation for data? ( I think correlation is not good in data) so why the result is not normal?
Except average what is your suggestion in statistics?
Thank you in advance
sara
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornHi Sara, your question is not clear to me. Perhaps you could provide a sample process or data file. But in general, if the average contains information from other examples that are not part of your training set then yes you could have information leakage that would bias your model and make it appear stronger than it really is.5
Answers
Thank you for the answer. So in this situation how can I make any label for my data? ( instead of average what is your suggestion to choose as a label?)
Sorry the data is not mine and I can not share it.
Regards
Sara
If you want to predict a classification problem rather than a numerical value, you could also define a threshold value and then create a nominal attribute to specify whether the attribute is above or below that threshold. So, for example, if the value was a sales amount, then you could define a threshold of "high value" transaction at say $1000 and then classify individual examples as either high value or not high value based on that threshold. But you would probably want to consult with a domain expert for the data that you have to determine a threshold like that.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
I had the label like the one that you mentioned and again it was useful but very similar to the average because I made an average for one attribute and less or more than that was my label. So still it brings correlation for the data and the result of accuracy is more than 95%.
Any way thank you for your answer
Sara