[SOLVED] Really basic question, I think I'm applying models wrong.
My first read database gets all of the values from the documents (20k)
My second read database(1k documents) has a value isGood = 1 if the value is good, -2 if the value is bad and a bunch of other really bad ideas.. I set isGood to label. Should I actually only be passing true/false or is an integer okay?
I use nominal to text to get the "data" field as text.
I then process the document, looking for word frequencies etc.
Is my Naive bayes even in the right place?
My end goal is that I feed it 1000 known good documents and it can find very similar documents from the first read database... I want my confidence score to be based on document similarity.
I am getting an output that contains confidence but I'm not sure how to present my output, I don't come from a statistical background so I'm learning on my feet. I appreciate I have a lot to learn so in 3 weeks time I'm going to read some books/content about how to use rapidminer and ML in general. I can only apologize for my ignorance!
Can I use an integer as a label?
Am I using naive bayes and apply model correctly?
How can I view my data in an easy to interpret way. Ideally something like a list of document IDs with their confidence rating.