Help with naive bayes
Hello guys
I need help. I got assignment that I have to predict "y" value using Naive Bayes algoritm. The data I have to use is very big, complex, and doesnt make too much sense to me. I posted them under "train" file. Values from "Xo" to "X8" are in letters, and other are binary. Only normal value is "y" . Value "y" i put to nominal value(cuz of Bayes), and label it. I also put the proces in archive. I watched many tutorials, but I think the problem is with my data. I know that i need to remove some irelevant data, but its just to many for my knowlage and current skills. In performance I got 0.32% accuracy. Can u please look at my Rapidminer program and train files and tell me what to do and how? Or maybe this isn´t the right algorythm for my problem, thats also an option.
Thank u so much
I need help. I got assignment that I have to predict "y" value using Naive Bayes algoritm. The data I have to use is very big, complex, and doesnt make too much sense to me. I posted them under "train" file. Values from "Xo" to "X8" are in letters, and other are binary. Only normal value is "y" . Value "y" i put to nominal value(cuz of Bayes), and label it. I also put the proces in archive. I watched many tutorials, but I think the problem is with my data. I know that i need to remove some irelevant data, but its just to many for my knowlage and current skills. In performance I got 0.32% accuracy. Can u please look at my Rapidminer program and train files and tell me what to do and how? Or maybe this isn´t the right algorythm for my problem, thats also an option.
Thank u so much
Tagged:
0
Best Answer

lionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,193 UnicornHi @yoak95,
1/ Your label has real value and is continuous : Thus, from my point of view, you have a regression problem and not a classification problem. So Naive Bayes model can not be used.
First I removed your first eight attributes (attributes with letters as values) and I submit your data to AutoModel : In deed, you have a lot of irrelevant attributes : AutoModel detect and remove automatically these irrelevant attributes from your dataset before building the process.
With these methodology, I obtain relatively good results : around of 6% of relative error.
In attached file, one of the process builded by AutoModel (GLM model).
2/ If you want absolutely use Naive Bayes model, you have to transform your regression problem into a classification problem :
I created four classes of equal sizes for your label "y". You can perform this operation at the "prepare target" screen of the AutoModel.
After training a Naive Bayes model, I obtain 47 % of accuracy.
Feel free to try different numbers of classes...
You can find in attached file, the classification process using a Naives Bayes model builded by AutoModel.
Hope this helps,
regards,
Lionel
5
Answers