3 weeks ago
first of all I'm pretty new to Rapidminer. I'm a student and working with the tool for educational purposes.
I built a model for predicting a binominal outcome using a neural network. I have a training dataset and one unlabelled for application. Both datasets have the same structure. I managed to train my neural network in a cross validation operator and measure the performance on the training data and I can also apply this to my application data. And after applying the model to the application data 3 new columns are created (predicted(outcome), confidence yes/no), but I'm not sure if I'm doing this right... I can't use another performance operator after the application of the model, because it would require a labelled input. Is there another way to get the same performance vector matrix as for the training data in order to check accuracy, precision and recall for the new application data or would this require a labelled data set? How can I check the performance of my model nonetheless?
I'd really appreciate your help!
Solved! Go to Solution.
3 weeks ago
Actually it sounds like you're doing it right, what I tend to do is before building the model, split my data into a Training set & a Test set.
So you have 3 datasets:
Try this now and look at the results. Great right!
However, how can you be really sure you can 'trust' your model? You've only tested it once, maybe it just got 'lucky' and in reality it's not going to perform as expected.
There's various ways to ensure you can trust your tested model performance, so after you've tried out the Split Validation I'd like you to read this series of 4 blog posts by @IngoRM and download the Repository with sample processes.
Let us know here how you get on!