Apply Model: Testing & Training Sets Differ

HyramHyram Member Posts: 39 Contributor II
Hi
I am using Sentiment 140 as my training and testing data. They have already split the data into two sets. I am performing training, cross validation and testing all separately. Training and CV on the training set and testing on the testing set. The problem I have is that after text preprocessing, the features in the test set don't align with those of the training set and therefore I can't apply the trained model. In text preprocessing, my end product is a matrix where texts are the examples and the features are aligned to the term frequencies which will be different for the training and test sets. 
Do I somehow merge both sets so that the features are aligned and TF = 0?
Thanks

Best Answers

Answers

  • HyramHyram Member Posts: 39 Contributor II
    edited July 2020
    Apologies - I see this was solved by Marius and Ingo in 2012. Was wondering - if you join word list output of process documents from train leg to word list input of process docs on test leg, if it uses same TF values or zeros for out put of process docs on test leg. The values carried through are indeed zero. 
    This works, using the word output of the training leg but what if I am processing that information after the process docs operator and reducing features by using a select by weight operator?
  • HyramHyram Member Posts: 39 Contributor II
    Thanks very much @Telcontar120 and @jacobcybulski!
Sign In or Register to comment.