rafeenarafeena Member Posts: 14 Contributor II
if i would like to calculate the entropy for each word, during my preprocessing what should i set my word vector to? it would not be advisable to set it to TFIDF right?

Best Answer


  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Can you clarify, what do you mean by calculating the entropy of each word? Vectorization is simple preprocessing of texts in an unsupervised fashion, whereas entropy usually is with respect to a label.  So there is no built-in vector metric that would supply anything like a conventional entropy measure. If you are asking which vector you should use if you want to calculate entropy later, then I would think the simple term occurrences would be the appropriate one since that is merely a count of all instances of a given token in a given document.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    rafeenarafeena Member Posts: 14 Contributor II
    i would like to use entropy and TFIDF as my feature selection method. i would like to know will it effect the entropy result if i set the word vector to TFIDF.
Sign In or Register to comment.