"Different performance results online-training vs. model loading"

balamirbalamir Member Posts: 3 Contributor I
edited May 2019 in Help
I'm new to RapidMiner and I'm experimenting with setting my model. I first tried with a randomforrest learner. Here is the tree view to show my setup (similar to a tutorial setup).

image

I got around 75% accuracy.   Then I created another experiment which outputed its model to a file.
image

and I loaded that model and run the experiment again.
image

The last experiment gave me ~97% accuracy (in one run it was 100%)


Did I misunderstand the flow? I assume first two experiments generate the same model (or similar)  when I give the same experimental input set. So why when I load the model it gives very high accuracy? I tried it a few times just to make sure it was not a lucky selection of the features.

Thanks for any explanation.
Tagged:

Answers

  • steffensteffen Member Posts: 347 Maven
    Hello balamir and welcome to RapidMiner

    First of all: There is no need to make screenshots. It is sufficient to copy the text in the xml-tab in RapidMinerGUI and post  it here. This has also the advantage that we see all parameters you have set ;).

    @your question:
    You misunderstood the concept of Crossvalidation.
    In the first setup only 9/10 of the dataset is used to create the model, the rest (1/10) is used to calculate the accuracy. 
    In the next setup you use 10/10 of your data to create the model. Then you apply the model 10 times to 1/10 of the dataset.
    The key difference is, that in your first setup the data you use for validation has NOT been used to create the model in opposite to your second/third setup.

    Since the model in the third setup has seen all the data, nothing can surprise it, so the accuracy is much more higher. This is what we call Overfitting.

    I strongly suggest that you reread the description of Crossvalidation in RapidMiner Tutorial and/or take a look into a good book
    .
    Here is another thread regarding crossvalidation (only the first and second post are relevant): http://rapid-i.com/rapidforum/index.php/topic,62.0.html

    greetings,

    Steffen
  • balamirbalamir Member Posts: 3 Contributor I
    Thanks steffen for the warm welcome and  quick reply. I'm aware of cross validation but I didn't make the connection. Your explanation clarified the difference between both setups..
Sign In or Register to comment.