Perform text classification with seperate test/train splits ?

kashif_khankashif_khan Member Posts: 19 Contributor II
edited February 2020 in Help
Hi,

i am a newbie and dealing with text classification in rapid miner. I have seperate test/train splits and i want to select top k features with respect to information gain(for e.g with high information gain). In general(without feature selection) we need to provide output of  "Process Documents From Files" (wordlist) used for train set to "Process Documents From Files" (wordlist) which is used for loading test set but how can we do the same if we need to apply feature selection to train set and provide the reduced feature as a vocabulary to test split ??

Kindly help i searched alot on internet but all have done with n-fold cross validation and i could'nt figure out how to use it with dedicated test/train splits 


Answers

  • kashif_khankashif_khan Member Posts: 19 Contributor II
    I figured it out myself ... by acquiring some help from stack overflow ...
Sign In or Register to comment.