Can anyone explain the calculation of IDF value for Test sets? Is it based on the IDF of Training sets? I see that test set take only the word list used by the training set and IDF is Calculated solely based on the test set. So, if Test set contain only 1 document, then there is a chance that IDF becomes 0, correct?
If you are using TF-IDF you must store model _and_ wordlist after training. To test or score unseen data you have to preprocess with exactly the same "Process Documents"-Operator that you used for training including the wordlist.