🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉
Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.
"Apply model to test set feature selection"
I need to do a sentiment analysis to predict "positive" and "negative"
Sorry before if it's a double post.. but I still don't understand about this thing, I already watch the video tutorial in rapid-i.com and http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-part-5.html but I still don't get what I supposed to do
Here's the thing:
I have 2000 movie review, 1000 of them put in a folder named "Positive" (including positive review) and another 1000 in a folder named "Negative" (including negative review)
I have to extract the feature in that 2000 review, so I used "process document from files" as in the vancouverdata.blogspot.com tutorial, and I create a word vector (TF-IDF). after that the process create a result called "ExampleSet" and it has 12305 attributes. It means, I have 12305 features extracted from the 2000 reviews, right?
From this point, I need to do a feature selection.. how can I do that? I see there are operators such as backward elimination, Forward selection and so on but I confuse how to use that.. I download the "feature selection extension" and I use "Recursive Feature Elimination (RFE, SVM-RFE))" (this operator use a top k method) but I can't find the documentation about what this method do exactly to eliminate the features. can you help me?
After using the feature selection, I have to train the data.. thus, I use a classifier (let's say for this example, I use Naive Bayes). When I use a classifier, it means I train the exampleSet, right? Now where could I find the complete documentation about what exactly Naive Bayes operator do in RapidMiner to train the data?
After the data already trained, it means the model is already created to right? I want to apply this model to another movie review (Test Set). I have another 100 movie review and I put 50 of them in folder called "pos" (including positive review) and another 50 in folder called "neg" (including negative review). I want to apply the model so it can predict whether it's positive or negative, how to do that?
after that, I need to create a report in excel format.. How can I export the exampleSet and performanceVector to xls automatically? Is it possible?
to summarize what I need:
1) is 12305 attributes in "ExampleSet" is 12305 feature?
2) How can I do feature selection to that 12305 feature using forward selection or other optimization method?
3) THE MOST IMPORTANT: how can I apply model generated from trainSet into my own testSet?
4) Where could I find the complete documentation about what exactly an operator do in rapidMiner? (as Rapid-i wiki is not what I expected though)
5) How could I export all the result into report in excel so it's easy to see and can be opened without using rapidMiner?
That's all for now.. To be honest I'm an IT student but I really don't have a background in machine learning, natural language processing, Information Retrieval, or data mining.. so I really need a help cause I'm newbie, but I seriously want to learn.. thx a bunch I hope I could get the answer as soon as possible..