Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
WordList (Process Documents from Data): word count
Using Process Documents from Data operator we get - as Wordlist - a table with: the list of words with Total occurences and Document Occurences.
However we also get - in a sample process "Applying a Model to categorize Documents (under RM Academy) additional columns for classes/categories, in the above mentioned process 2 columns named unknown and food/beverage/hospitality.
When you use Wordlist to Data the columns are labelled with: inclass (unknown) etc.
I get all zero values in both columns, no matter which vector creation method I use ( I use Term Occurences). What shall be changed to get the words counted for both classes.
Thank you.
0
Answers
I have been running the process your were referring to - assuming this is the one - I haven't been able to reproduce the issue. Can you share your process or send a screenshot? See details on how to do this here: https://community.rapidminer.com/discussion/37047
Did you watch the related video? https://academy.rapidminer.com/learn/video/applying-a-model-to-categorize-documents
Thanks, Knut
I finally found the time to look into it. The "0" values are caused by the "Extract content" operator in "Process Documents from Data". Go into the Parameters of that operator and untick the first entry called "extract content". If you do that and run the process again then you will see that the columns get populated and show you the total occurrence for each of the two classes ("unkown" and "food/beverage..."). That output could be used for example to generate a custom pruning mask to reduce the data of the class which is not of interest but I guess there are also other creative options.
You are now probably wondering why the extract content operator is causing the empty values and my answer is: I don't know. But without having more details I'd say it feels like a bug to me so I will send this to our developers. Hope this helps!
Cheers, Knut