RAPIDMINER 9.7 BETA ANNOUNCEMENT

The beta program for the RapidMiner 9.7 release is now available. Lots of amazing new improvements including true version control!

CLICK HERE TO DOWNLOAD

"Text Mining and the Word List"

mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,412  RM Data Scientist
edited June 2019 in Knowledge Base

Symptoms

Using Process Documents (from Data) you are able to generate a tokenized example set from a given set of documents. If you use one Process documents for your training and another for the testing you might get the error incompatible number of attributes if you apply the model.

Diagnosis

The problem probably that you did not transfer the word list from one Process Documents to the other. The wordlist contains mainly two information:

  •  Which attribute to generate
  • The normalization

If you do not transfer the wordlist over, words which do not occur in your document won't create a attribute. In case of pruning different words will be deleted from your bag of words. Another effect is of course that even if you create the same attributes, your normalization (of TF/IDF) might be different.

 

Solution

 Wordlist.png

Transfer over the wordlist created in your training stream over to the application stream. Thus you create

- Head of Data Science Services at RapidMiner -
Dortmund, Germany
JEdwardbhupendra_patilmariannita
Sign In or Register to comment.