"I'm going nuts -- word vector frequency by category"

apierceapierce Member Posts: 4 Contributor I
edited June 2019 in Help
Processing documents from files, I categorize each document by which folder its coming from.  When I run the process requesting a word list result, I get a great word list table showing all the words from my process, the "Total Occurrences" and the "Document Occurences" as columns in the table.  Also included as columns in the table are each of my categories.  But all the cells for each of the categories shows 0, rather than what I want, which is the total occurences of the word in the category.  I'm sure I'm missing a simple operator to obtain this result but can't figure it out.  Any help would be appreciated.



  • Options
    apierceapierce Member Posts: 4 Contributor I
    I've isolated this a bit more.  Now I've determined that I get the proper results if I remove the "Extract Content" operator from the process.  Why would this change the categorization freqency result set?  Is there any other way to get the freqencies for the categories after extracting html code?
  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi Andy,

    can you please post your process setup as described in the post linked in my signature?

    Best regards,
Sign In or Register to comment.