RapidMiner

RapidMiner

IDF Calculation for Test Set

Contributor

IDF Calculation for Test Set

Can anyone explain the calculation of IDF value for Test sets?
Is it based on the IDF of Training sets?
I see that test set take only the word list used by the training set and IDF is Calculated solely based on the test set. So, if Test set contain only 1 document, then there is a chance that IDF becomes 0, correct?
2 REPLIES
Regular Contributor

Re: IDF Calculation for Test Set

If you are using TF-IDF you must store model _and_ wordlist after training.
To test or score unseen data you have to preprocess with exactly the same
"Process Documents"-Operator that you used for training including the wordlist.
Contributor

Re: IDF Calculation for Test Set

Thank you for the reply